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WHAT SHALL WE EXPECT OF THE AQ? 
HERBERT A. TOOPS 


AND 
P. M. SYMONDS 


Institute of Educational Research 
Teachers College 


The AQ, or accomplishment quotient, procedure is one of the 
most recent, not to say most promising, acquisitions of the educational 
psychologist. Its implications have been subject to no little confusion 
even among the originators of the technique, as will be evident when 
we consider its derivation in more detail below. 

The educational problem of motivation, as a partial solution 
from which the AQ hypothesis was derived, involves in its widest 
aspects the problems of the ultimate aims of education; the administra- 
tive problems of sectioning, retardation, promotion, and elimination; 
the pedagogical problems of motivation and differential treatment; and 
the research problems involved in the measurement of educative 
capacity and of educational product. 
ht, We may or may not agree to delegate to the educational philosopher 
the task of determining the ultimate aimsofeducation. By unanimous 
agreement, the realization of those aims is at present left to the school 
administrator, the pedagog and the educational psychologist. By 
pointing out some of the limitations, as well as advantages, of 
certain postulates and “‘axioms” connected with the AQ hypothesis, 
the authors hope to show the probable effect of the hypothesis upon 
administrational and testing practices, to point out some new ends or 
aims of education that may soon come to the foreground of public 
discussion, and possible resulting changes in school administrational 
procedures. Thus the authors will raise many questions without 
attempting to answer them adequately, if at all. The man of prac- 


tical affairs may be inclined to remonstrate that our remarks are 
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destructive criticism rather than suggestive of constructive programs. 
On the contrary, the authors hold to the point of view that an aware- 
ness of our ignorance is of as much value in pointing out the road to 
progress as a knowledge of our accepted scientific “truths.” 


A COMPARISON OF THREE CURRENT AQ CONCEPTS 


The accomplishment quotient is advocated by Franzen,! and 
under the name ‘achievement quotient’? by Monroe and Bucking- 
ham,? as a measuring device for combining in an effective way the 
results of educational and mental tests into a measure of educational 
achievement relative to the pupil’s capacity to progress. Essentially the 
same purpose is served by a different statistical technique developed 
by Pintner.’* 

The AQ is to be considered as the ‘‘ degree to which a pupil’s actual 
progress has attained to his potential progress by the best possible 
measures of both,’’* or as a “‘simple method of comparing a pupil’s 
achievement age with his mental age (learning capacity).”> Appar- 
ently there is no single word in the English language which adequately 
expresses what is meant by the term AQ. In its statistical derivation 
it is quite as abstract a conceptasz orr. Its formula is: 


EA 
_EQ_ CA _ EA 
AQ=7Q = MA ~ MA () 
CA 
where, P 


AQ = Accomplishment, or achievement, quotient 
EQ = Educational quotient 
IQ = Intelligence quotient 
EKA = Educational age 
CA = Chronological age 
MA = Mental age. 


1 Franzen, R.: The Accomplishment Quotient. Teachers College Record, Vol. 
21, No. 5, Nov., 1920, pp. 432-440. 

2 Monroe, W. S. and Buckingham, B. R.: Illinois Examination. Teachers 
Handbook, University of Illinois, Bureau of Educational Research, July, 1920, p. 31. 

3 Pintner, R. and Marshall, H. A.: A Combined Mental-educational Survey. 
Journal of Educational Psychology, Vol. 12, No. 1, Jan., 1921, pp. 32-43. 

4 Franzen, R.: loc. cit., p. 436. 

5 Monroe, N. 8. and Buckingham, B. R.: loc. cit., p. 11. 
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Pintner’s method consists essentially of transmuting educational 
test and mental test scores into index values ranging from 0 to 100 
for a given age, average ability in each being 50. The method assumes 
a normal distribution of both mental and educational talent. Pintner 
uses as his measure of motivation: 


Difference = Educational index — Mental index. (2) 


This measure is ‘‘the difference between a pupil’s native capacity and 
his actual accomplishment.”’! 

According to Franzen, an AQ of 1.00 indicates ‘‘optimum accom- 
plishment”’ or ‘‘ what a pupil is able to do under the best conditions;”’ 
and according to Monroe and Buckingham, it means ‘‘that the pupil 
has achieved exactly as well as the average of the pupils of his mental 
age;”? while, according to Pintner, an index difference of zero, occur- 
ring when the mental index is equal to the educational index, or a 
corresponding AQ of 1.00, apparently means that the pupil is doing 
educationally exactly what “is usually accomplished by children of 
like mentality.’’? 

According to Franzen, an AQ less than 1.00 means that the pupil 
is doing school work which is less than normal for his mentality, and, 
according to Monroe and Buckingham “if a pupil’s achievement 
quotient is 0.75, we have evidence that he has achieved only 75 per 
cent as much as the average of the pupils of his mental age;’”? while, 
according to Pintner, ‘‘a minus difference means that the child is: 
doing less educational work than he has the ability to accomplish’’* 
although, as noted below, he does not imply that a plus difference 
indicates that the pupil is doing more work than he has ability to 
accomplish. : 

According to Franzen an AQ of more than 1.00 is impossible, as 
represented in his statement: ‘‘One’s differences when EQ is subtracted 
from IQ are always positive when they are large enough to be signifi- 
cant and small enough to seem spurious when they are negative 

. It is safe, therefore, for practical use to assume that the 
optimum accomplishment is 1.00;’* and according to Monroe and 
Buckingham, “If the pupil’s achievement quotient is 130, it means that 
he has achieved 30 per cent more than the average of the pupils 





1 Pintner, R. and Marshall, H.: loc. cit., p. 37. 

2 Monroe, W. S. and Buckingham, B. R.: loc. cit., p. 11. 
3 Pintner, R. and Marshall, H.: loc. cit., p. 38. 
‘Franzen, R.: loc. cit., p. 436. 
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of his mental age,”’! while according to Pintner, “‘a plus difference 
means that the pupil is doing more educationally than has usually 
been accomplished by children of like mentality.’ 

Pintner, and Monroe and Buckingham find many pupils making 
‘‘more than average accomplishment for their mental age,’’ Pintner 
specifically stating that “it is useless to attempt to set up any such 
ideal standard (of what ought to be accomplished under ideal condi- 
tions where each child is working up to the limit of his capacity); 
in contradistinction, Franzen states, ‘‘we can measure the approxima- 
tion to ideal educational performance of any one child in any one 
elementary school subject through the approximation of this accom- 
plishment quotient to 1.00.” It is evident that even among the origi- 
nators, there is a great difference of opinion in regard to the meaning 
to be attached to any AQ. Part of this disagreement, no doubt, will 
be eliminated once all compute their indices in identical statistical 
fashion, and on the same tests.* 


THE DISAGREEMENT IN TERMINOLOGY INVOLVED IN THE AQ 
HyYPoTHESIS 


Dr. Otis has pointed out to us the errors of terminology in which 
we are likely soon to be involved in regard to the various ratios. We 
find research workers talking of a reading quotient, that is, of the ratio 
of reading-subject-matter age to chronological age; of a reading- 
accomplishment quotient, that is, the ratio of reading-subject-matter 
age to mental age; and finally of a more general accomplishment 
quotient in the sense of the ratio of average subject-matter ages in a 
number of school subjects to mental age. By early agreement, 
research workers may decide upon adequate definitions of standard 
terms and thereby prevent ultimate hopeless confusion. It may be 
noted that the term used for the more general accomplishment quotient 
must be defined in terms of the subject-matter ages to be included while 
also taking into account how they are to be combined or weighted if we 
are to hope for even approximately valid comparisons of the work of 
various research workers. A pupil’s general AQ evidently depends in a 
very real way upon his election of subject matter and so cannot be 


1 Monroe, W. S. and Buckingham, R. B.: Loc. cit., p. 38. 

2 Pintner, R. and Marshall, H.: loc. cit., p. 38. 

’Part of the confusion is due to the fact that Franzen used the Stanford 
Revision Individual Test while the other investigators used Group Intelligence 
Tests. 
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expected to be as constant even as hisIQ. Likewise, mental age needs 
to be defined in terms of what tests shall be used and how they shall be 
combined if we are to hope for any reasonably comparable results in 
intelligence measurement from different research workers, or even 
from the same research worker in successive instances. In this article, 
unless specifically noted otherwise, we have taken the term ‘‘accom- 
plishment quotient” to mean either the subject-matter-accomplish- 
ment quotient or the more general accomplishment quotient. Our 
discussion of the limitations of the Q procedure will hold for either 
the more specific or the more general case. 


DiFFICULTIES INVOLVED IN THE Norms USED, AND IN THE SELECTION 
oF STANDARD TESTS 


One cause for confusion in the interpretation of the meaning of a 
given AQ lies in the difference in procedure used in computing norms. 
Monroe and Buckingham, and Pintner, follow the customary proce- 
dure in determining a norm; namely, finding that score which is the 
median for a given age and calling that score the norm for the age. 
Franzen finds the average age of all people who make a given score, 
thereafter calling the given score the norm for the average age thus 
found. Thus, stated in statistical terminology, the former workers 
make use of the regression of score on age, while the latter makes use 
of the regression of age on score. The regression of age on score, it 
will be noted, is the customary regression line used in such problems 
as that of predicting the age at death from a statistical measure of the 
person made prior to the event. The adoption by all workers of the 
other regression line, if proven statistically advisable, is not an impossi- 
ble task. We point out below that the use of the other regression line 
in norms does not do away with what seems to us to be a very real 
objection to the AQ procedure.! 

The question of equivalence of scores on tests constructed by differ- 
ent research workers is also in a state of flux, as are many of the statis- 
tical implications of mental and educational measurements of which 
the controversy regarding the two regression lines in norms is but one 
example. Otis is advocating the use of a line, which when plotted 
lies between the two regression lines for converting mental test scores 
of one scale into “‘equivalent scores’”’ on the other, disregarding the 


_— 





1 Recent reports show that local community selection is so great that “blanket” 
norms are often meaningless. See Chapman below. 
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fact that there is no true equivalence of two test scores. Without 
true equivalence of different mental and educational scales, we cannot 
expect identity of interpretation of AQ’s secured by different workers 
using different mental] test, educational tests, or both. We are contin- 
ually being reminded nowadays that the IQ was devised as a brightness 
measure for one intelligence scale, the Stanford Revision of the Binet 
Scale; and that, consequently, the IQ procedure is not in strict scientific 
usage applicable to other scales than the Stanford scale.? If the AQ 
procedure is to have a monopoly on Stanford IQ’s, it necessarily must 
have a monopoly on Stanford MA’s, for it will be seen that the CA’s 
cancel out in equation (1), leaving only two simple variables, EA and 
MA. We need but one of these invalidated in order to have the 
whole fractional equation invalidated. 

And whose EQ shall be considered a standard one? Not only 
does this point to an inadequacy of the AQ procedure but of the IQ and 
EQ procedures as well. There is good reason for believing that the 
IQ is not the best possible brightness measure, even in the case of 
the Stanford Scale. As hinted at by Toops and Pintner? there are an 
infinite number of comparatively simple equations of the first degree— 
not to mention higher degrees—of the type, 


_(MA) +1 
ICA) +E ®) 


which will fulfill the requirements of yielding: (1) A ratio of 1.00 
for perfectly normal individuals, (2) ratios of more than 1.00 for 
individuals brighter than normal, and (8) ratios of less than 1.00 
for individuals who are duller than normal. A particular one of this 
family of curves may fulfill better the additional desirable requirement 
of approximate constancy through the grade-schools ages than does 
the present IQ formula. The IQ equation can be thought of as the 
simplest possible case of the more generalized mathematical ratio, 


en a nite +M)+K 


(a.C* +b.C*™'!+¢.C%2?+ ...+C) +k (4) 
where M equals mental age, and C equals chronological age. Among 


1 Thorndike, E. L.: On Finding Equivalent Scores in Tests of Intelligence. 
Jour. of Appl. Psych., Vol. 6, No. 1, 1922, pp. 29-33. 

2 Trabue, M. R.: Some Pitfalls in the Administrative Use of Intelligence 
Tests. Jour. of Educ. Research, Vol. 6, No. 1, 1922, pp. 1-11. 

8 Toops, H. A. and Pintner, R.: Curves of Growth of Intelligence. Jour. of 
Exp. Psych., Vol. 3, No. 3, 1920, p. 235. 
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the “‘infinite-infinite”’ number of possible equations, there is probably 
one which will fit the empirical facts better than the one now used. 


TECHNICAL DIFFICULTIES IN SECURING ALTERNATIVE STANDARDS OF 
CAPACITY AND ATTAINMENT 


Fairly comparable IQ’s, EQ’s and AQ’s can only be obtained by at 
least taking into account the reliability coefficients of the different educa- 
tional and mental tests respectively. That is, concretely, an IQ 
secured by the Jones Mental Test can only hope to measure exactly 
the same thing as an IQ secured by the Johnson Mental Test if the Jones 
Test correlates perfectly with the Johnson Test. Not even equal 
correlation with the same identical criterion of intelligence solves the 
problem. As an illustration, suppose two tests, 1 and 2, are totally 
uncorrelated with each other, they will yet correlate each with a valid 
criterion of intelligence to the maximum extent of 0.71. This may be 
shown by substituting the values, ri2= 733, and re; = 0, in the formula 
for the multiple correlation coefficient involving three variables when 
the multiple correlation coefficient is a maximum, or 1.00. Thus: 





oF Ee SE 
1.00 = ric. = 4/222 Fir = Purl (5) 
~ ae 


Substituting the above values, 
1.00 = 2ri2?, whence rie = 713 = / 50 = 0.71 


If it were possible to construct two ‘‘intelligence” tests, a group 
test and an individual test, which would correlate zero with each 
other, both might yet correlate equally with a ‘‘valid criterion of 
scholarship” as highly as 0.71; and yet on the one test an idiot would 
as likely as not be rated genius, and vice versa which argues neither 
for the group test nor for the individual test. This problem must be 
settled on another basis than statistical theorizing since it is practically 
impossible to design two tests according to the above specifications. 
To determine socially valid and comparable AQ’s we must consider 
validity, correlation with an adequate criterion, as well as correla- 
tion between the two intelligence and the two educational tests used by 
two different research workers. The two forms of test may be per- 
fectly reliable and yet not measure at all what we would have them 
measure. The conclusion is obvious. We need not a commercial- 
ized multiplication of scales and an equally thoughtless diversity of 
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statistical techniques but an ultimate soundness of method. The 
very equation of the AQ, if written in another form 


EA = (AQ) (MA) 


educational ns educational mental 
achievement environment /\ capacity 


means that educational achievement is equal to mental capacity as it 
is acted upon by an educational environment (motivator) which varies 
in its intensity for individual pupils from somewhat more than zero 
AQ to somewhat more than 1.00 AQ. It would be but mockery to 
say that any one of our multitudinous intelligence scales measures 
“mental capacity”? when scarcely any of them correlate highly with 
each other, none correlates highly with the social acts wherein intelli- 
gence functions, and some do not correlate very highly even with them- 
selves in their alternative forms. A ratio of such unreliable variables 
is necessarily less reliable than either of its components. 

In securing his measure of capacity, Pintner uses non-verbal 
intelligence tests ‘‘to get as far away from language and the things 
taughtinschool” as possible. Yet his educational norms are still based 
on what is now taught. Pintner quite rightly wishes to get an “ulti- 
mate measure of ability or rate of doing work” which he hoped to get 
in non-language tests. It is known, from the work of Herring,? 
Gates, the N. I. T. tests, and others, that non-language intelligence 
tests do not correlate nearly so well with ability to get along in school 
(as the school subjects are now taught) as do verbal tests. It has been 
found that the more verbal the tests the higher their correlation with an 
‘“‘adequate” criterion of intelligence or of ability to get along in school. 
We need only consider the limiting case of an intelligence test which is 
so “‘non-verbal”’ as to correlate zero with achievement in order to see 
that the measure of capacity must correlate highly with the measure of 
attainment. The real requirement is that the test used shall be as 
little susceptible as possible to improvement through practice or 
coaching. 


THe EXPERIMENTAL GRovuP WHICH DETERMINES ‘‘Capacity’’ Norms 
SHOULD BE MAXIMALLY MOTIVATED 


Without testing ‘‘maximally motivated” pupils to determine our 
norms of “ potentiality,” we can but approximate an ultimate measure 


1 Herring, J. P.: Verbal and Abstract Elements in Intelligence Examinations. 
Jour. of Educ. Psych., Vol. 12, No. 9, 1921, pp. 511-517. 
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of the capacity to do school work. We need not deny our slow but 
steady progress in measurement.! It will do no harm, however, to 
realize that so long as our mental tests correlate with an ‘‘adequate 
criterion”’ to the extent of less than 0.71, “‘all-other unrelated factors” 
will correlate with the same criterion to a greater extent than 0.71; 
and further, that ‘“‘a composite of all other unrelated factors including 
also some factors common to the first mentioned test,’”’ such a com- 
posite as it would be impossible to approximate in an almost ‘‘ totally 
different” type of test, would correlate with this same criterion con- 
siderably in excess of 0.71. Not until we construct intelligence tests 
which will correlate to the extent of 0.87 with such a criterion will we 
reduce to half the standard deviation of the criterion its standard 
error of estimate. 

Evidently there is no method whereby statistically we can determine 
when a child is maximally motivated. The best we can do is to 
arrange an experimental class with the best conditions and incentives 
to maximal effort that the best pedagogical judgment can devise and 
then measure what educational product is produced. The person 
who can arrange greater incentives in a subsequent experiment will 
secure a greater educational product. ‘‘ Maximal motivation without 
neglect of essential school activities” would yield the best norms— 
“balanced” norms. Scientific method requires that such a group be 
used as the experimental group in constructing the tests of capacity 
and achievement. Even then we are in exactly the same position as 
the time study men of industry who decide that a fair day’s work is 
what the average man produces. In the long run what is considered 
fair is what the workers will agree to accept as a fair day’s work; it is 


1 Pressey, S. L.: Suggestions Looking toward a Fundamental Revision of 
Current Statistical Procedure as Applied to Tests. Psych. Rev., Vol. 27, 1920, 
pp. 466-472. 

Ruml, B.: Reconstruction in Mental Tests. Jour. of Phil., Psy. and Sci. 
Meth., Vol. 18, No. 7, 1921, pp. 181-185. (A criticism of Pressey above.) 

Pressey, 8. L.: Empiricism versus Formalism in Work with Mental Tests. 
Jour. of Phil., Psy. and Sci. Meth., Vol. 15, No. 16, 1921, pp. 393-398. (The 
reply to Ruml’s criticisms.) 

Ruml, B.: The Need for the Examination of Certain Hypotheses in Mental 
Tests. Jour. of Phil., Psy. and Sci. Meth., Vol. 17, No. 3, 1920, pp. 57-61. 

Kelley, T. L. and Terman, L.: Dr. Ruml’s Criticism of Mental Test Methods. 
Jour. of Phil., Psy. and Sci. Meth., Vol. 18, No. 17, pp. 459-465. (Reply to Ruml 
directly above.) 

Chapman, J. C.: Some Elementary Statistical Considerations in Educa- 
tional Measurements. Jour. of Educ. Research, Vol. 4, No. 3, 1921, pp. 212-220. 
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not necessarily a scientifically determined quantity of work in spite 


of its scientific appearance, its abundance of fine but unreliable 
measurement. 


Some Curious PHENOMENA NOTED IN THE USE OF THE AQ 


Let us inquire into the educational treatment accorded subjects of 
low AQ by Franzen. We see “unmotivated” pupils, discovered by 
the AQ procedure, given special educational treatment with the general 
result that their AQ’s are brought up to 1.00, but not beyond it. 
What may be one explanation of this case? It has repeatedly been 
shown by Kirby, Chapman and others that motivation leads to 
distinct improvement. In fact, Thorndike! tells us that there is no 
reason to believe that in many functions the acceleration of improve- 
ment within the ordinary physiological limits need be a negative one 
provided we furnish sufficient motivation. Franzen used the Stanford 
Test while the other workers used group tests; he likewise used a 
different regression line in computing his norms. Aside from these 
differences, is it not likely that his subjects did improve up to 
the given expected point, the goal of 1.00 AQ, and that then 
improvement did stop with few going beyond AQ’s of 1.00, because 
the teacher and pupils were led to believe that an AQ of 1.00 was 
satisfactory; that is, that the “‘motive’” to improve was greatly 
lessened or suddenly became of zero value as soon as an AQ of 1.00 
was reached? If, as will be shown shortly, half or more of the dull 
pupils can expend more than “normal effort,’ why cannot all of 
humanity do more than an AQ of 1.00? It probably can! Why, 
then, if sufficient incentive is provided, should not at least half of 
his school system do more than the average amount of school work 
usually done by people of the same mental age in school systems in general? 
Does not the greatest value of the AQ, after all, consist not in its 
measuring value but in its incentive value—its value in getting the 
teacher and pupil interested in progress? ‘The graph of progress is a 
very real incentive to the pupil—so very effective because it compares 
his educational attainment with himself; because he is competing with 
himself, and is not required to beat out pupils of greater ability. 
For, even if of low IQ and he works up to an AQ of 1.00, he is doing 
‘‘just as well” as the pupil of greater intelligence who does more work 
and achieves a greater EA. Is mental capacity to be likened to the 


1 Thorndike, E. L.: Educational Psychology, Vol. 2, 1913, p. 257. 
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fabled beggar’s wallet which can be filled only so full before it will be 
filled to overflowing and burst from its very opulence? And, yet, 
may it not be good school policy for the present to keep the AQ from 
going above 1.00, in order to insure that the school will not put too 
much emphasis merely on the things which the tests measure, and 
allow opportunity for securing some of the appreciations or attitudes 
which, though intangible, are valid objectives of education? The 
EQ aims only to measure education as it now is, and not as it ought to 
be. Granted the truth of the hypothesis, the AQ procedure is a very 
real incentive method, as shown by the fact that the correlation of 
about 0.6 between EQ and IQ at the beginning of Franzen’s experi- 
ments was pushed to about 0.9 by intensive stimulation of his pupils 
to effort. 

Another curious phenomenon, noted by Pintner, is that there 
are more bright people not working “up to capacity” than dull ones 
who are “‘doing more than is expected on the average of pupils of their 
mental capacity.’”’ Another investigator in an unpublished report 
finds a correlation of—0.40 between MA’s and AQ’s in the case of 
pupils of Grades V to VII. Is there not significance in this fact which 
we may interpret from the known facts of the school situation itself? 
Is it not a remarkable coincidence that the ‘‘below normal”’ in intelli- 
gence are for the more part above average in motivation while the “‘above 
normal” in intelligence are for the most part below average in motiva- 
tion? We are often inclined to accept the generalization that all 
good things are positively correlated; correlation and not compensation 
is the rule in human nature. Either human nature is perverse in its 
schoolroom duties, or the school methods are badly at fault. 

Both statistical methods evidently assume that the immediate 
ideal in education should be to raise the AQ of all ‘‘ poorly motivated” 
pupils to 1.00. This makes the statistical assumption that all pupils 
in all school subjects of a ‘“‘perfectly adjusted and maximally 
motivated” school should have an AQ of 1.00; or, all plotted points 
would lie on a straight line of regression when the subject-matter ages 
in a given school subject are plotted against mental age; or that, 
stated differently, in a properly motivated school working up to maxi- 
mal capacity, the correlation between mental age and subject matter 
age is 1.00. There is much empirical evidence which will cause us to 
doubt this ultimately perfect correlation. In the most highly cor- 
related of physical sizes of bilateral members of the human body, such 
as the length of the right arm correlated with the length of the left 
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arm—not to mention the lesser correlations of the physical capacities 
of such bilaterally symmetrical members—the correlation is always 
somewhat less than unity. Should we then expect a perfect correla- 
tion between mental capacities? Franzen finds that remedial educa- 
tional measures recommended for pupils with AQ of below unity 
brought a majority up to unity, and as above pointed out, he believes 
that AQ’s above unity are spurious, at least when using his methods of 
computation. ‘The same educational procedure applied to the pupils 
already at an AQ of 1.00 might have produced some very large AQ’s. 

A partial explanation of this curious phenomenon will now be 
presented. It seems likely that the AQ results are in part due to the 
statistical assumptions underlying the accomplishment index tech- 
nique rather than to an ultimate soundness of method. If we assume, 
for the moment, that there is not a perfect correlation between educa- 
tional age and mental age, in a ‘‘maximally motivated school” (one in 
which teaching, school environment and ‘‘effort”’ are ideal) educational 
index regresses upon mental index and vice versa. As will be shown 
below, pupils of high IQ are then more likely than not to be lower in 
EKA than their MA ‘would warrant,” and conversely, pupils of low 


IQ are more likely than not to have an EA higher than their MA 
“would warrant.”’ 


If we assume normal correlation of the EQ and IQ, when r is less than 1.00, 
we obtain a surface of distribution of the two, when plotted against each other, 
similar to that of Fig. 1. The line cc, drawn at an angle of 45° to the horizontal, 
would represent a line of perfect correlation; 7.e., a line on which would be plotted 
all persons whose EQ’s exactly equal their IQ’s. Consequently all persons happen- 
ing to fall on this line have AQ’s exactly equal to 1.00. Any person lying above 
this 45° line, is found to have an EQ greater than IQ, and will therefore have an 
AQ greater than 1.00. Such a person is P. Conversely, any person lying below 
this 45° line has an AQ which is less than 1.00. Such a person is Q. 

The regression of y on 2, or yy, is so drawn that it bisects each vertical array. 


Its equation is y = Pope t. Let us now consider any given vertical array of 


people having 1Q’s less than 1.00; z.e., subnormal in mentality. Such an array will 
be any vertical array to the left of the average (Mx) of the 1Q’s such as is represented 
by Akls. Half of the area of the array is included above the line of regression of 
y on x; that is, 50 per cent. in jklm. But the area of persons in the array with AQ’s 
greater than 1.00 is represented by the area, ikln = jklm + jmni = 50 per cent. 
of the array + the area, jmni. That is: more than half (asrepresented by the excess 
area, jmni) of any unselected dull persons have AQ’s greater than 1.00 solely by 
reason of geometrical necessity, irrespective of whether the hypothesis that EQ 
can be brought up to IQ is ultimately sound or not; that is, a correlation surface 
of less than 1.00 always has such an area as yoc. Conversely, by consideration 
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of any vertical array hkls of brighter than average persons, more than 50 per cent. 
have AQ’s less than 1.00 solely by reason of geometrical necessity, as above. It 
then follows that (1) the measuring validity of the AQ is thus far an unproved 
postulate, and that (2) the demonstrated bringing of low AQ’s up to 1.00 by special 
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educational treatment is no proof of a fundamentally perfect relationship between 
mental ability and educational achievement. Asacorollary to (2), if it be assumed 
that fundamentally there is not perfect correlation of EQ and IQ, then even in a 
perfectly sectioned school, geometrically it would be expected (be perfectly 
‘‘normal”’) to have more than half of the dull people with 1Q’s more than 1.00 and 
half of the bright people with AQ’s of less than 1.00. The area yoc diminishes in 
size as r becomes larger, but does not disappear until r becomes 1.00. Thus the 
AQ, as a statistical measure, is the old problem of trying to lift one’s self by one’s 
boot straps. In all the above it is assumed that the distribution of both IQ’s 
and EQ’s is normal, which may or may not be true. 


It should be noted that, with all its defects, the AQ method will 
pick out the school or the individual who has an AQ of such magnitude 
that it would seldom happen in a normal correlation plot between EQ 
and IQ. Like the IQ, a difference of 0.01 in AQ in the case of two 
people with AQ’s respectively of 0.65 and 0.66 is a different amount 
from the difference between two persons with AQ’s of 0.99 and 1.00 
respectively. Perhaps by a more complicated mathematical procedure, 
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wherein we calculate the probability of a child ‘‘working up to capa- 
city,’ we may yet improve the measuring value of the AQ as well as 
secure its value as an incentive method.! A rough empirical beginning 
in this direction has been made by Pintner in setting up his + 
boundry lines of “normal effort.’”’ We may reiterate the time-worn 
statement: The simplest explanation is by no means necessarily the 
truest. At best, under present circumstances, using norms computed 
in the usual way (norm = average score for a given age), AQ’s of more 
than 1.00 seem just as logical and just as necessary (although at 
present not as abundant, as heretofore pointed out) as AQ’s less than 
1.00. The explanation thus suggested by the geometrical approach is 
that both the positive and negative differences so far found, likewise 
all AQ’s not 1.00, are entirely due to the lack of perfect correlation 
between the mental and educational indices, part of the differences 
indicating true school maladjustment, and part being as yet 
undetermined. 

The use of the regression of age on test score in norms will alter the 
proportions of the above diagram but will not eliminate AQ’s of more 
than 1.00, if both MA’s and EA’s are computed by the same method. 

Whether the AQ can be brought up to 1.00 or to 1.50 or to any 
predetermined figure, depends to a very great extent upon the EA used, 
the nature of the scale or examination, the norms used and the school 
subject under consideration. Presumably a child with good arithme- 
tical ability might be easily brought up to an AQ of 1.00 in the “‘four 
fundamental operations” of arithmetic since his EQ, the numerator of 
equation (1), might be easily brought up to normal, while it might be 
found much more difficult to bring him up to an AQ of 1.00 in an arith- 
metical test dealing largely with arithmetical reasoning. We no 
longer talk of the ‘‘rote memory type” and “‘logical memory type” 
of person, but we know that there are apparently rare cases of “‘special 
abilities’? and ‘“‘disabilities” in school subjects, and even in different 
parts of the same school subject. Neurologists and psychologists 
disagree regarding their neural basis, and proper remedial treatment. 
Such cases of “disability” are ‘‘real enough” to the teacher to be 
labeled with the term, even though theoretically they may prove to be 
non-existent. 





1 Since writing this article, the authors have been privileged to read an as yet 
unpublished manuscript by J. C. Chapman wherein he demonstrates to his own 
satisfaction that the difference, between educational age and mental age, obtained 
from a single testing, is quite too unreliable for individual readjustment of pupils. 

Epitor’s Note: Dr. Chapman’s article is published in this issue. 
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OTHER POSSIBLE EXPLANATIONS OF THE LESS THAN UNITY CORRELA- 
TION BETWEEN EA anp MA 


Pintner’s norms are based on the very pupils upon whom he 
reports. Pintner assumes that anyone whose educational index is 
more than eight points advanced over his mental index is advanced in 
motivation, and vice versa. Had he called “‘advanced”’ all people who 
were on the plus side, and ‘‘retarded”’ all who were on the minus side, 
then with these two groups he would find approximately 50 per cent of 
retarded motivation and 50 per cent of advanced motivation. In the 
light of an article by Toops and Pintner! dealing with pupils of the 
same city, and social status similar to that of those who determine 
Pintner’s intelligence norms, it becomes evident that one very impor- 
tant reason for so many of the mentally advanced pupils being retarded 
in accomplishment is the fact that many bright pupils are promoted by 
chronological age rather than by ability to progress and so have not had 
the chance to come up to normal by being given opportunity to do 
advanced work. In the article cited it was found that only 15 per 
cent of the total of 1218 pupils were advanced in school one semester or 
more, while, by the same standard, 37 per cent were retarded in school 
one semester or more. With such a state of affairs, it undoubtedly is 
true that many dull pupils are attempting too difficult work, while it is 
assuredly true that many bright pupils are not attempting as difficult 
work as they are capable of doing. 

Another factor which always operates in mental measurements is 
attenuation. The fact that at present the correlation between mental 
and educational indices is less than 1.00 may be partly explained by 
the inaccuracies in the measurements which always tend to attenuate, 
or lower, the correlations. 

Undoubtedly other reasons for the less than 1.00 correlation are to 
be found in ‘‘special abilities or disabilities’? and interest, actual 
laziness, etc. Other bad conditions in home and school exert their 
influence also. It will be found that most of the bright but less than 
1.00 AQ pupils are doing above average work in the grades they are now 
in, where merely “‘passing” performance is acceptable. Pintner, Coy, 
Whipple, Coxe, and others have shown that motivation and more and 
better school work is the almost inevitable result when mentally 
advanced pupils are promoted in school to that point where they have 





1Toops, H. A. and Pintner, R.: Mentality and School Progress. Jour. of 
Educ. Psych., Vol. 10, No. 5-6, 1919, pp. 253-262. 
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the competition of pupils of about the same mental ability, or rates 
of progress. 

Dr. May, as the result of a preliminary investigation of the rela- 
tionships of the hours spent in study, intelligence and school marks of 
college students, advises the writers that in the case of his subjects 
there is a decided negative correlation between hours spent in study 
and intelligence; this certainly means that bright students, able to 
“get by the passing mark”’ with little study prefer to spend a propor- 
tionately larger amount of their time on other than study activities; 
consequently, by consideration alone of the very objective measure of 
number of hours spent in study, it is very evident that such bright 
pupils would be readily able to accomplish more if required to do so 
by being placed in a section of bright pupils where competition between 
the extreme members of the class would be more truly real competition. 


(Continued in January) 














THE METHOD FOR FINDING THE CORRESPON- 
DENCE BETWEEN SCORES IN TWO TESTS 


ARTHUR 8. OTIS 
Yonkers-on-Hudson, New York 


PURPOSE 


The statement made by the writer that “the equation of the 
line which most probably expresses the true relationship between x 


and yisy = a x’ has been challenged by eminent statisticians and 


for that reason it has seemed desirable to publish a proof.! 

The statement referred to the variables x and y, as two measures of 
the same trait. (In the particular case under discussion the trait was 
general mental ability.) The values z and y were subject to errors of 
measurement, causing them to correlate less than 1.00 with each other. 


It has been contended that the regression line, y = tev x, expresses 


the true relationship between z and y, and it is with the special purpose 
of correcting this view that the present article is written. 


METHOD 


It has seemed desirable to give the proof in two forms; first, a proof 
by analogy which, while not rigorous, is nevertheless believed to be 
vivid and suggestive, and second, a rigorous mathematical proof. 


First PRooFr 


A Hypothetical Case.—In order to bring out clearly the difference 
between the two lines referred to above, namely, the line whose equa- 


. . oC . . . . . 
tion Is y = zy 7 x, which is called a regression line, and the line whose 
Zz 


equation is y = - x, which is called in this article the relation line, let 


us consider a hypothetical case of two variables. Take for example 
the Fahrenheit and Centigrade thermometer scales. If both were 


1 This statement appeared in the article entitled The Reliability of the Binet 
Scale and Pedagogical Scales. Journal of Educational Research, September, 1921, 
p. 132. In that article the reliabilities of z and y were assumed to be equal. 
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applied to the same thermometer portions of each scale would 
correspond as shown below. 


C. F. 
20—||—68 
15—||—59 
10—||—50 
5—|/\—41 
0—||—32 








That is, 0°C. measures the same temperature as 32°F., 5°C. measures 
the same temperature as 41°F., etc. 

Now let us suppose that each of the several temperatures is read 
independently by two persons, one reading Centigrade and one 
Fahrenheit. Let us suppose for the moment that in a certain experi- 
ment the thermometer is read by both persons, 


16 times while standing at 15°C., 
32 times while standing at 10°C., and 
16 times while standing at 50°C. 


If the readings by both individuals are accurate in all cases and if 
plotted, these would appear as shown in Plot A. 

For the sake of introducing the factor of error, let us suppose, 
instead, that the person reading the Fahrenheit scale stands so far 
away from the thermometer that the numbers are indistinct so as to be 
often misread. Let us suppose that half the readings at each tem- 
perature are correct, that one-fourth are one graduation too high, 
and one-fourth are one graduation too low. If the readings of tem- 
perature thus made by the two persons were plotted these would 
appear as shown in Plot B. 

Now let us suppose both persons were to read the thermometer from 
so far away, as to make similar errors, half of the readings of each 
temperature being correct, one-fourth too high, and one-fourth too 
low. This will give us the sort of correspondence between unreliable 
readings of the same temperature by two different scales that is found 
between the two unreliable measurements of mental ability by two 
mental ability tests. Each of the numbers 4, 8, and 4 in the 15°C. 
row of Plot B would in this case be split vertically into a fourth, a half, 
and a fourth so that the 16 readings of actual temperature 15°C., by 
the two persons, when plotted would appear as shown in Plot C. 
Similarly the 32 readings of actual temperature 10°C., by the two 
persons, when plotted would appear as shown in Plot D. Similarly 
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the readings of actual temperature 5°C., by the two persons, when 
plotted would appear as shown in Plot E. 

When the pairs of readings of all 64 temperatures were plotted these 
would constitute the summation of Plots C, D and E, as shown com- 
bined in F and summated in G. At the top and right edges of the 
plot are shown the totals of the columns and rows. 

Plot G has been converted into Plot H by placing the numbers 
representing the frequency of readings at the intersections of lines 
instead of in the squares. | 

The Regression Line—Now in Plot H, let us consider first the 
four cases in which the temperature was read as 32°F. In these four 
cases the readings on the Centigrade scale were 1 at 0°, 2 at 5°, and 
1 at 10°, with a mean reading of 5°. Next take the array' of 16 cases 
in which the temperature was read at 41°F. In this array the readings 
on the Centigrade scale were 2 at 0°, 6 at 5°, 6 at 10°, and 2 at 15°, the 
mean of these being 7.5°. And so on. If we drew a straight line 
through the means of all these arrays the line would be located as shown 
at M. This is called a line of regression. 

Since the mean of the Centigrade readings, which are associated 
with Fahrenheit readings of 32°, is 5°C., it is said that 5°C. is the most 
probable reading on the Centigrade scale which will be found associated 
with a reading of 32° on the Fahrenheit scale. Or, in other words, if a 
65th reading is made on the Fahrenheit scale, under the same condi- 
tions? and this is a reading of 32°, and it is desired to predict what will 
be the reading of the same temperature made on the Centigrade scale 
by the other individual, the best prediction is a reading of 5°C. It is 
in this way that the regression line is used in prognosis. Similarly 
since the mean of the Centigrade readings found associated with a 
reading of 68°F. is 15°C., it is said that given a Fahrenheit reading of 
68°, the most probable Centigrade reading which will be associated 
with it under the same conditions is 15°C. 

Why the Regression Line Does Not Show True Correspondence.—Why 
is it, however, that the mean Centigrade reading found associated 
with readings of 32°F. is 5°C. when the Centigrade value corresponding 
to 32°F. is known to be 0°C.? The answer is as follows: In the first 
place all 4 of these readings of 32°F. are in error downwards by hypo- 





1 The distribution of values on one scale associated with a single value on the 
other scale is called an array. 
2 The meaning of this expression will be brought out later. 
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thesis since the actual temperature read in each case was 41°F., as 
shown in Plots A and E. 

Now 41°F. is the same as 5°C. so one would naturally expect the 
average of readings of the 4 temperatures of 5°C. to be 5°C. 

Similarly the mean of the array of Centigrade readings found 
associated with readings of 41°F. is at 7.5°C. but this is not the same 
temperature as 41°F. which is only 5°C. And, as before, the explana- 
tion is that of these 16 readings of 41°F., 8 were correct readings of 
actual temperatures of 41°F. and 8 were incorrect readings of 50°F. 
The average of these 16 actual temperatures is 45.5°F. and this is the 
same temperature as 7.5°C. As before, one would naturally expect 
that the mean reading of actual temperatures averaging 7.5°C. would 
be 7.5°C. 

The mean of the array of Centigrade readings found associated 
with readings of 50°F. is 10°C. and 10°C. = 50°F. This case differs 
from the preceding in that 50°F. happens to be the mean of all the 
Fahrenheit readings and consequently the mean of the 24 actual 
temperatures read as 50°F. was exactly 50° which equals 10°C. so 
naturally the mean Centigrade reading of these temperatures would be 
expected to be 10°C. 

Going up the scale we find the mean Centigrade reading found 
associated with readings of 59°F. is 12.5°C. instead of 15°C. which 
equals 59°F. and we find the mean Centigrade reading found associated 
with readings of 68°F. is 15°C., whereas 68°F. corresponds to 20°C. 
The chief point to be noted in this connection, however, is that if we 
did not know in advance what number of degrees Centigrade denoted 
the same temperature as 32°F. we could not find it by taking the mean 
of the array of Centigrade readings found associated with readings of 
32°F. for the obvious reason that the number of degrees Centigrade 
denoting the same temperature as 32°F. is 0° while the mean Centi- 
grade reading found associated with readings of 32°F. is 5°C. 

The same is true all the way up the scales with the single exception 
of 50°F. in this particular case, because it is the mean of all the Fahren- 
heit readings. The procedure which should be adopted to find the 
Centigrade reading corresponding to any given Fahrenheit reading 
will be described later. 

The Meaning of Regression—It will be seen that instead of the 


means of the arrays of Centigrade readings found associated with each 
of the Fahrenheit readings 


32°, 41, 50°, 59°, and 68° being respectfully 
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0°, 5°, 10°, 15°, and 20°C to correspond, they were in reality 
5°, 7.50°, 10°, 12.5° and 15°C. 

The mean value of Centigrade readings found associated with each 
of the Fahrenheit readings tend to be nearer to the mean (10°) of all 
the Centigrade readings than are the Centigrade values to which these 
Fahrenheit readings correspond. 

kIt is said that the means of these arrays of Centigrade readings 
regress (fall back) toward the mean (10°) of all the Centigrade readings. 
That is why the line is called a “regression line.”’ 

There are Two Regression Lines.—In the same way it may be seen 
that the means of the arrays of the Fahrenheit readings corresponding 
to the several Centigrade readings regress toward the mean (50°) of 
these Fahrenheit readings so that if a line is drawn in a plot through 
these means it will take the position shown at N in the Plot H. This 
is the other regression line, there being two in every such case, one 
through the mean of the vertical arrays and one through the mean of 
the horizontal arrays. 

A Generalization —We may now make a very general statement 
and say that whenever x and yvalues are plotted and do not correlate 
perfectly, the mean of every array of y values associated with any 
single value of x is nearer to the mean of all the y values than is the 
value of y which truly corresponds to that single value of z. 

Effect of Shifting Distributions.—It should be noted that while the 
true value of the temperatures read as 68°F. was in this particular 
case 20°C., nevertheless if the 16, 32 and 16 temperatures had been at 
50°, 59°, and 68°F. respectively the mean of the true values of the 
temperature then read as 68°F. would have been 17.5°C. And if the 
64 temperatures had been at 59°, 68° and 77°F. the mean of the true 
values of temperatures then read as 68° would have been 15°C. This 
means that if the regression line were used in the effort to determine the 
true Centigrade value corresponding to 68°F., this would be found to 
be 20°C., in one case, 17.5° C. in another, and 15° in another. 

The value of one variable which will most probably be found asso- 
ciated with a given value of the other variable varies therefore accord- 
ing to the general position of the values investigated on the scales. 

Use of the Regression Line in Mental Testing.—Now let us see the 
significance of this statement as applied to mental measurement. 
Suppose we have tested a group of Grade XII pupils with Forms 
A and B of a Mental Ability Test, and wish to find the most probable 
score a pupil will have made (or will make) in Form B who has made a 
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score of 100 in Form A. This is done by means of the regression line 
which indicates the theoretical mean of the B scores found associated 
with an A score of 100. Although the B score truly corresponding 
to the A score of 100 might be also 100, the mean of the associated 
B scores might be 110, showing that a pupil in this group making 
a score of 100 in Form A would most probably have made a score of 
110:in Form B. This is because 100, being a low score for such a 
group, is most probably in error downwards. And it may be said also 
in the case of any other Grade XII pupil who has taken Form A only, 
but has made 100 points, that insofar as he is typical of the Grade XII 
pupils of the group considered, he, too, will most probably make a 
score of 110 in Form B. This merely amounts to saying that if a 
typical Grade XII pupil makes a score of 100 points in this test, his 
score is most probably in error by 10 points downward, and that this 
error tends to be corrected in his second score. 

On the other hand, if a group of Grade V pupils were tested with the 
same two forms, A and B, then by means of the regression line in the 
new plot it might be found that the mean of the B scores found asso- 
ciated with A scores of 100, was only 90, showing that a Grade V child 
who made a score of 100 in Form A will most probably have made a 
score of 90 in Form B. This is because a score of 100, being for 
a fifth grader a high score is most probably in error wpwards. And of 
any other Grade V pupil who has made a score of 100 in Form A it 
may be said that if he is typical of the fifth graders who took both 
forms, he too will most probably make a score of 90 in Form B. This 
merely amounts to saying that if a Grade V pupil makes a score of 100 
the probability is that his score is in error by 10 points upward, and 
that in a second score this error tends to be corrected. 

The regression line therefore shows the most probable true score in 
a second test which a pupil would obtain who made a given score in 
a first test, the score in the first test being in error. The regression 
line therefore does not show the true correspondence between true 
scores in both scales. 

How May the True Correspondence be Found?—We come now to the 
problem of finding the true line of relation between two variables 
when we have before us only the plot such as Plot G showing the 
incomplete correspondence between the two variables. 

Let us go back to plot A and trace the evolution of the standard 
deviations of the two variables. In plot A, op (the standard devia- 
tion of the 64 F. readings) = 9+/} and a¢ (the standard deviation of 
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the 64 C. readings) = 5/3. Next we assumed that errors which 
occurred in the Fahrenheit readings were distributed thus: 


Errors... ..-—9, 0, +9 
Frequency . . . 16, 32, 16 


Here, o-r (the standard deviation of the errors of Fahrenheit readings) 
= 9+/3.! We assumed also that errors which occurred in Centigrade 
readings were distributed thus: 


Errors... . .—5, 0, +5 
Frequency . .. 16 32 16 


Here, o.r (the standard deviation of errors of Centigrade readings) = 
5V2. 

Variabilities of Observed Measures are Proportional to Variabilities 
of True Measures.—It will be seen that the magnitudes of the errors 
made in the two scales (as measured by their standard deviations 
o-r and o.c) have the same ratio (9:5) as the standard deviations of the 
true tempcratures themselves in the two scales. That is: or: oc: 
or:oc¢. This is for the obvious reason that an error of 9 degrees on the 
Fahrenheit scale equals an error of 5 degrees on the Centigrade scale. 
The effect of these errors is such therefore that the standard deviation 
of the observed measures on the Fahrenheit scale is 96 of the standard 
deviation of the observed measures on the Centigrade scale. Or, to 
put it the other way round, the ratio of the standard deviations of the true 
Fahrenheit and Centigrade measures is the same as the ratio of the standard 
deviations of the observed Fahrenheit and Centigrade measures which is as 
9:5. 

The Correspondence between Means.—As has been shown, the mean 
of the whole distribution of values of either variable does not tend to 
be in error either upward or downward and therefore the mean of the 
whole distribution of values of one variable probably truly corresponds 
to the mean of the whole distribution of values of the other variable. 

The Relation Line.—Going back to Plot H, then, if we wish to find 
the true correspondence between Fahrenheit and Centigrade values, 
we must draw a line through the point representing the mean (50) of 
all the Fahrenheit readings and the mean (10) of all the Centigrade 
readings, such that for every 9 units on the horizontal scale the line 
rises 5 units on the vertical scale. This is the line R. The line R, 
then, expresses the true relation between the Fahrenheit and Centigrade 
scales and is called the Relation Line. 

1 There is no necessary connection between this and gp). 
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A Further Generalization—We may now make the general state- 
ment that whenever two measures, x and y, of the same trait (such as 
scores in two tests of the same ability) are not perfectly correlated, 
and there is no evidence that one test is any more reliable than the 
other, the line which most probably represents the true relationship 


between the two measures is the line y = rs x when the means of the 


values of the variables have been taken as the zero points from which 
to measure the variables. This is the line drawn through the point 
representing the means of the two groups of measures and through the 
point S representing + lo in each distribution and through the point 
T representing — Ile in each distribution, as shown in Fig. 1. Stated 
in other words, the true correspondence between such measures is 
probably such that the mean of the measures of one variable equals 
the mean of the measures of the other variable, and the standard 
deviation of the observed values of one variable represents the same 
increment of ability as the standard deviation of the observed values 
of the other variable. 

In this proof there is an underlying assumption throughout that the 
two scales by which the variables are measured are so constructed that 
the relationship is rectilinear, by which is meant that the units of one 
scale bear a constant relation to the units of the other scale throughout, 
so that the true line of relation is a straight line. 

Cases in which the line of relation are not straight must be dealt 
with as discussed on page 125 of the article referred to and also inthe 
Reliability of Spelling Scales, School and Society, October 28-November 
18, 1916. 


SECOND PROOF 


Hypothesis.—Let us suppose we have two mental ability tests, 
X and Y. 

Let X1, Xo, Xs, etc., represent the scores obtained in Test X by the 
different individuals, and Y,, Y2, Y3, etc., represent the scores obtained 
by the same individuals in Test Y. Thus, X without a subscript 
represents any score obtained in Test X, and Y represents any score 
obtained in Test Y. 

Let x; represent the mean of a very large number of scores of the 
first individual in Test X and be considered, therefore, as the true score 
of that individual in Test X. Let xe, x3, etc., represent similarly the 
true scores of the other individuals in Test X. 
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Let X1 — 41 = €1, X2 — X2 = ee, etc. The value e, therefore, is 
the amount by which the obtained score of any individual differs from 
his true score as defined. Similarly, let Y: — y: = fi, Ye — ye = fe, 
etc. Generally speaking then X — « =e and Y —y =f. 

The variables e and f may be considered as errors of measurement,? 
and obviously they are totally uncorrelated with each other and with 
x and y. 

For the sake of simplicity let us assume that the values of X, 2, Y, 
and y, are measured from their respective means so that 


> xX = Q, noe 


and the same for xz, Y, and y. 


The quantities, e and f, will be sometimes positive and sometimes 
negative and we may assume them to be distributed normally in each 
case with the mean at zero, in which case the mean of the X values is 


the same point on the X scale as the mean of the x values, and the 
same for Y and y. 


While we have spoken of Tests X and Y as both being mental 
ability tests, it is not certain, of course, that the traits measured by 
the two tests are absolutely identical. In other words, rz,, the corre- 
lation between what we have called true scores in Test X and true 
scores in Test Y, may be slightly less than +1.00. But, for the time 


1 There are, of course, influences affecting scores in a mental ability test, such 
as varying degrees of effort, etc., which are theoretically distinguished from 
mental ability itself but which nevertheless may be correlated, either positively 
or negatively, with mental ability as defined. Thus it is conceivable that dull 
pupils might try harder to score well in a mental ability test than bright pupils, so 
that effort might correlate negatively with mental ability in a certain group. But 
in so far as effort is correlated either one way or the other with mental ability, just 
to that extent the test score measures efiort (or the opposite of effort) as well as 
mental ability and equality of effort will tend to make for equality of score in the 
same way that equality of mental ability does, although, of course, to a lesser 
extent. In other words, mental ability as measured is not mental ability as defined, 
and when we speak of the reliability of a test, we mean the consistency of its scores— 
the degree to which two scores of the same individual correspond. In that sense 
all factors which contribute consistently to the score and thereby operate to make 
the two scores of the same individual in the same test equal are to all practical 
purposes part of the ability tested, and we may as well consider the effects of those 
factors which cause two scores of the same individual in the same test to differ 
as being to all practical purposes errors of measurement. This is not essential 
to the proof, however. 
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being, let us assume that r,, = +1.00 and later we will consider the 
case in which r,,< +1.00. 

Let us suppose it is desired to find the most probable relation 
between true values of x and true values of y. In other words, let 
us suppose it is desired to find the relation between two values, x and 
y, when these measure the same amount of the trait. 

It will now be shown! that this relation is expressed by the equation 


alas pe or. (A) 
xx Ox 


in which rx, is the reliability coefficient of variable X, and ryy is the 
reliability coefficient of variable Y. 

Proof.—Assuming that rz, = +1.00, let yi: = maxi, yo = Me, 
Ys = mx3,etc. The constant, m, is the ratio, therefore, of the units 
of the two scales; and the tangent of the angle of the line which repre- 
sents the true correspondence between measures of the two scales is 
therefore equal to m. 


If y = mz , (1) 
then y? = mz? (2) 
Sy? = m?rzz? (3) 
o?, = mo, (4) 
o*, . . 
yl m (5) 

O"z 
m = -¥ (6) 

Cz 


Of course, we do not know the value of c, and o, because these are 
standard deviations of true scores which we cannot obtain but it will be 


shown now how to find the value of ~” from the values of Ox, Cy, Txx, 


Oz 
and ryy, which can be found. 

By definition, X=2z2+e (7) 
Squaring, X? = x? + er + e? (8) 
Summating, DX? = Ya? + 2ex + Le? (9) 

‘ Lex 
Now by the formula for correlation, r.. = ———— 10 
. V/ de? D2? (10) 


1 It should be borne clearly in mind that it is not sought to prove that this equa- 
tion is to be used to find the most probable value of Y that will be found associated 
with a given value of X, nor that it is to be used to find the most probable true 
measure, in terms of a Y scale, of the trait in an individual who has attained a 
given measure, X, in another scale. This formula is not to be used for prediction 
or for estimating true values in one scale from obtained values in another. For 
these purposes the regression equation should be used. 


. * ee See “FSS pe Se 


~ 


| 
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| 
: 











540 The Journal of Educational Psychology 


But by hypothesis, Ter = 0 (11) 
Therefore, Lex = 0 (12) 
From equations 9 and 12, DX? = Lx? + Le? (13) 
Whence, oy = 07, + 0”, (14) 
or o?, = o0*x — o°, (15) 


Equation 14 shows that the standard deviation of a distribution of 
true scores is augmented by the introduction of errors to the extent of 
the standard deviation of the distribution of errors. 

Now by a formula! devised by the writer, 

r'xx = 1 - a (16) 
o°x 
in which r xx is the reliability coefficient of correlation between scores 
in Test X, and e has the same meaning as used above. 


Now by equation 16, ryxo*y = 07x —o”, (17) 
By equation 15, o?, = o*y — o*, (18) 
Therefore, o°, = lyxo"x (19) 
and Os = Vrxxox (20) 


This equation constitutes a formula for finding the standard devia- 
tion of true scores of a group of individuals from the standard deviation 
of the obtained scores of those individuals, knowing the reliability 
coefficient of correlation obtained from the same group of individuals. 


Similarly oy = Vrrvoy (21) 
Co r s Cy 
Therefore, E et a doe (22) 
Or xx IX 


Now the equation of the line which represents the true correspon- 
dence between scores in Tests X and Y, as shown in equation 6, is 


y=—2 (23) 


By Equation 22 this equation becomes y = \ ryy OY y (24) 
"xx Ox 
This then is the equation of the line which represents the true 
correspondence between scores in Tests X and Y, assuming that true 
scores in these two tests measure identical traits. 





1 This is the same formula as equation 1, page 140 of the article entitled, The 
Reliability of the Binet Scale and Pedagogical Scales, Journal of Educational 
Research, September, 1921. 
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The Correspondence between Two Forms of a Test.—Now if we are 
dealing with two ‘‘forms”’ of the same test, the presumption is that one 
form is just as reliable as the other, in which case we may assume 


thatrxyx = ryy and hence <* =] 


'xx 

It is reasonable to assume also that the correlation between true 
scores in the two forms is practically perfect, that is, the two forms may 
be assumed to measure identical traits, so we may call r,, equal to 
+1.00. In this case therefore, Equation 24 becomes simplified, so 
that the equation! of the line which most probably represents the true 
correspondence between the scores of the two forms of a test is 

Oy 


y= z. (25) 


Ox 


The way Equation 25 is used is as follows: Suppose it is desired to 
find the correspondence between scores in Form A of the Otis Higher 
Examination given as an initial test and Form B of the same examina- 
tion given a week later, so that scores in Form B, so given, could be 
transmuted into terms of Form A, so given, for comparative purposes. 
Both forms would be given to the same group of individuals, Form A 
first and Form B a week later. . Let us suppose the mean of the Form 
A scores is found to be 50 points and the mean of the Form B scores 
to be 52 points. Let us suppose o,4, the standard deviation of the 
scores in Form A, is found to be 11 points, and az, 10 points. We 
would then assume that 50 points in Form A, so given, corresponds 
to 52 points in Form B, so given, and that measuring the scores from 


; ta 10 
their respective means, any score in Form B equals il the correspond- 
ing score in Form A.? 


The Case in Which Tests Do Not Measure Identical Traits.—Now let 
us consider the case in which r,,< +1.00, that is, the case in which the 


1 When we are considering the correspondence between scores, we are referring 
of course to true scores. When we say, for example, that 50°F. corresponds to 
10°C., we mean of course that a true temperature of 50°F. corresponds to a true 
temperature of 10°C., not that some temperature erroneously read as 50°F. 
corresponds to some temperature erroneously read to 10°C. Similarly, when we 
speak of the correspondence between scores in Tests X and Y we refer to the corres- 
pondence between true scores, x and y. For that reason the equation of 
the line is given in a form expressing the correspondence between true scores, x 
and y, in terms of obtained scores, X and Y. 

2 This method is suitable, of course, only in case it is assumed that the relation- 
ship is rectilinear 
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true score of an individual in Test X does not measure exactly the 
same combination of traits as the true score of the individual in Test Y. 
In what sense, then, may there be a true correspondence between 
scores in Tests X and Y? It would seem that there can be a true 
correspondence only with respect to the measurement of that trait or 
group of traits which is measured by both tests. 

Now the true score x of any individual in Test X, as defined above, 
will differ slightly from the true score that he would obtain in Test X 
if the effect of certain factors specific to Test X were cancelled so that 
the score in Test X was affected only by factors which affected a score 
in Test Y also. 

Let this difference in score in Test X be represented by s. 

Let a similar difference in score in Test Y be represented by t. 

Let g represent the true score (average of a large number of scores) 
of an individual in Test X when the effect, s, of factors specific to Test 
X are cancelled; that is, when o, = 0. According to these definitions, 





z=gt+s (26) 

Let y=h+t (27) 
From Equation 26, x? = g? + 29s + s? (28) 
and rx? = Lg? + 2Zgs + Zs? (29) 

zgs 
» get: an OF 
but 9 a/ >g? Es? (30) 
whence Zgs = 0 (31) 
Therefore yx? = Yg? + Zs? (32) 
and o?, = 07, + 0%, (33) 
or o?, = 07, — 0, (34) 
Similarly, o%, = 07, — o, (35) 
2 
Now, as in Equation 16, fey = 1 — i (36) 
Multiplying by ¢c,’, TeyF*s = 07, — o*, (37) 
Now by Equations 34 and 37,  o*, = r:,0. (38) 
Similarly, o*, = r2yo", (39) 
Therefore oe ee (40) 
‘ ee oe. 
> an oa 
and ee (41) 


This equation shows that the ratio of the standard deviations of the 
true scores in Tests X and Y (true scores being now defined as scores in 


1 Since s factors are specific to z by hypothesis, therefore r., = 0. And by 
hypothesis rz, = +1.00. Therefore rz, = 0. 
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which the effect of all factors not common to both tests have been 
neutralized) is equal to the ratio of the standard deviations of the true 
scores as previously defined. 

Now the true scores (g and h) in Tests X and Y (as measures of the 
same trait) are of course perfectly correlated so that each value of h 
is some constant times the corresponding value of g. Let us represent 
this constant by m. 


Then h 


=m (42) 
h? = m?g? (43) 
Lh? = m* Xg? (44) 
Dh? 
. a 
m Sy? (45) 
= oh R\ 
and m no (46) 


The value of m is by definition the tangent of the angle of the line of 
true correspondence between scores in Tests X and Y as measures of 
the same trait. The equation of the line is therefore 


Og 


Substituting in this equation the value of - found in Equation 41, 
g 
the equation of the line becomes 


h=“%g (48) 


Cz 
Substituting in this equation the value of a found in Equation 22, 


the equation of the line becomes 


h=,{"t¥ [Fg (49) 
Txx Ox 
We might as well do away with the ultrafine distinction, however, 
between g and z and between h and y and let x and y represent the true 
scores in Tests X and Y as measures of the same trait, thereby getting 
back to familiar symbols. In that case Equation 49 becomes 


y= mo Tv. (50), 


Txx Ox 
in which the values of all the variables are measured, of course, from 
their respective means. 
Application of the Formula.—Equation 50 would be used in the 
following way: Suppose it is desired to find the true correspondence 
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between scores in the Binet Scale and the Otis Higher Examination. 
Call these Tests X and Y. Suppose these tests to have been admin- 
istered to the same group of individuals. Suppose the standard 
deviations (¢, and o,) of scores in the two tests by this group are 15 
and 18 respectively and suppose the reliability coefficients of correla- 
tion (rxx and ryy) obtained with this same group! to be 0.90 and 0.80 


respectively. The correspondence between scores will be expressed 
by the following equation: 


Otis score (measured from mean) = Ro x 7 xX Binet Score 
(measured from mean) (51) 
If the reliabilities of the two tests are not known or for other reason 
are considered as equal, Equation 50 becomes, of course, simply: 

a 
y = x (52) 
Derivation of the Regression Equation.—Now suppose variable X 
is a measure of age or some quantity not subject to errors of measure- 
ment so that we may call rxx equal to +1.00. Then tke correspon- 
dence between X and Y (Equation 50) becomes: 


y =V ry (53) 


Ox 


Now it may be shown that if rrxy = +1.00, -Vryy = rxy. Equation 
53 then becomes 


y =rxy— 2 (54) 


Ox 


This, of course, is the regular regression equation, showing that to 
find the score corresponding to (or normal for) any age, we may use 
the line of regression, that is, the line passing through the central 
tendencies of the arrays of scores for the several ages. 


A CORRECTION 


In the May, 1922, number of this journal there appeared an article 
by the writer entitled, A Method of Inferring a Change in a Coefficient 


1 If the reliability coefficient of correlation for either test has been determined 
using a group of a different heterogeneity from the present group it will be necessary 
to correct the coetticient for this difference in heterogeneity by a method explained 
in an article by the writer entitled, A Method of Inferring the Change in a Coefti- 
cient of Correlation Resulting from a Change in the Heterogeneity of the Group, 
Journal of Educational Psychology, May, 1922. (See correction below.) 
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of Correlation Resulting from a Change in the Heterogeneity of the 


Group. In this article the last equation (not numbered) is an error. 
This equation should read: 


, 0°, 
rs =1-— (1 a. Tey) o2.! 


z 


in which 7’,, and o?,’ refer to one degree of heterogeneity of the group 
and r,, and o?, refer to the other degree of heterogeneity of the group. 

The application of this method is as follows. 

Suppose 7,,, the correlation between Forms A and B of a test in 
Grade VI, is 0.75. 

Suppose 7’,,, the correlation between Forms A and B of the same 
test in a group combining Grades IV, V, VI, VII, and VIII, is sought. 

Suppose o,, the standard deviation of scores in Form A in Grade 
VI, is 40 points. 

Suppose o,’, the standard deviation of scores in Form A in the group 
combining the five grades, is 50 points. 


_.. 402 
Then / =l1- (1 —_ 0.75) 502 
ey = 0.84. 


It may be remembered simply that the deviation of the coefficient 
from unity varies inversely as the square of the variability of the group. 
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IV. Intelligence and Progress in High School Subjects —We can now 
turn to our data on specific high school subjects, and ask what intelli- 
gence is required for success in each. In general, the figures show 
that the present courses in algebra make as great a demand on intelli- 
gence as do those in any one subject taken in the freshman year, and 
this will be reported as a sample study. More failures occur; more 
pupils drop out. There is a correspondence between the score a pupil 
makes on a general intelligence test, and the probability of his “‘ pass- 
ing’ the course. There is a closer correspondence between his score 
on a test designed to measure mathematical ability, or ability to learn 
algebra, and the probability of his passing the course. It is worth 
while, if vocational or educational guidance is to be given, or if sections 
are to be made up on the basis of probable progress, to have both of 
these measures. 

Pupils who elect algebra, or who choose a course including algebra, 
are in general a more intelligent group than those who do not; pupils 
who pass in algebra are in general a more intelligent group than those 
who do take it but fail. The groups overlap considerably, but the 
one is definitely better than the other. The graphs in Figs. 5 to 10 
show the distribution of Alpha scores of pupils passing in algebra, of 
those who fail, and of those who do not take algebra. The contrast 
between the median scores of those who ‘‘pass”’ algebra and those who 
fail, or do not take it, is quite striking. In Alma, the median Alpha 
score of freshmen who passed in algebra was 94, while the median of 
those who failed was 78. In Mt. Clemens the corresponding medians 
were, for those who passed algebra, 107; for those who failed in algebra, 
89, and, still more significant, for those who did not take algebra, 69. 
In Mt. Pleasant the median Alpha score of the pupils who passed alge- 
bra was 89; of those who failed, 65. In Milan, the median Alpha 
score of the pupils who passed was 86, while that of those who failed 
was 75. In Detroit (Terman Group Test of Mental Ability), the median 
score of those who passed was 94 and of those who failed 84. 
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Fic. 5.— Distribution of Alpha Scores of pupils who passed, failed, or did not take algebra. 
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algebra. 


algebra. 


Alma, Michigan. 

6.—Distribution of Alpha Scores of pupils who passed, failed, or did not take 

Mt. Pleasant, Michigan. 

7.—Distribution of Apha Scores of pupils who passed, failed, or did not take 

Mt. Clemens, Michigan. 

8.—Distribution of Alpha Scores of pupils who passed, failed, or did not take 

Milan, Michigan. 

9.—Distribution of Alpha Scores of pupils who passed, failed, or did not take 

Four Michigan schools. 

10.—Distribution of Scores on the Terman Group Test of Mental Ability of 
pupils who passed, failed, or did not take algebra. 


Detroit, Michigan. 
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Our Michigan data have been analyzed to indicate also the expec- 
tation of failure when the Alpha score is below 55, 55 to 74, ete. 
Tables XII and XIII show this, for the schools separately, and for the 
combined data. 


TABLE XII.—PeER CENT oF FRESHMEN TAKING ALGEBRA AT Eacu LEVEL WHO 
FAILED IN ALGEBRA 


























a eee Fe ' 

| Clemens | Milan | a | Alma | Total 
hn AP erased | 25 0 0 _ 0 | 5 
I, cones sc da boince’ 25 0 0 | 12 | ul 
ES ee a 6 0 | 9.6 | 10 
cass cin eidie wey easerre | 47 7 6 | 16.7 | 14 
TCS Ee, ae 7 23 | 21 | 20 
0 errr Tere 0 | 14 | 0 | 33 | 20 
Median Alpha score........ | 89.2 | 75 | 67.5 | 79 | 79.6 





TABLE XIII.—PeEeR CENT oF FRESHMEN AT Eacu LEVEL WuHo Dip Not TAKE 
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Mt. ee : 
| Clemens Milan | a NO Alma | Total 
oe , ca yaa  #e Sa sic" — ~ Po ae j 
Re re eee 0 | 0 a 
ees r | 14 0 | an | O 7 on 
FTE Sees: a. 6 | oO ae 2 oe 
ee a a dot Oe 3 | 6 3.6 6 
ee ere. | 25 0 4 0 8 
NING S125 a5 ais wae: bir 4. ss 0 16 
— f; See ad ae 
Median Alpha score........ 69.4 67.5 | 87.5 | 85 | 77.5 





Another way in which to look at this relationship is through the 
correlation of algebra marks with Alpha scores. This varies very 
much from school to school, as it does with other school subjects, 
according to the content and method of the course, the skill of the 
teacher in motivating and in teaching both dull and bright pupils, in 
judging of their acquirements and their progress, and in assigning 
marks in keeping with these. Were these at their highest, and the 
Alpha examination a ‘‘perfect’’ measure of ‘‘general intelligence” 
the correlation would be closer—though never 1, since Alpha even then 
would doubtless be far from an exact measure of the specialized type of 
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intelligence for which algebra calls. The coefficients actually obtained 
from the Michigan data vary from +0.15 to +0.47, centering around 
+0.35. 

In the course of this work a number of persons familiar with high 
school classes in algebra have been asked to estimate in terms of intelli- 
gence quotient the degree of intelligence necessary to complete fresh- 
man algebra successfully. The various estimates run as follows: 
110, 110, 110, 105 to 110, 110 (this last was for an accelerated Grade 
VIII class in algebra). An intelligence quotient of 110 on the Stan- 
ford Revision of the Binet scale, at the age of 14 years (that is to say, a 
mental age of 15-5) corresponds to an alpha score of almost 100 (98.5). 

It would seem a safe conclusion that a pupil who scores from 100 
to 110 (or better) on the army Alpha examination should be fairly 
sure of the possibility of success in the usual course in algebra, as at 
present taught in academic high schools. This means a mental age of 
15-6 to 16-2, and, if the child begins algebra at 14, an 1Q of 110 to 115. 
Below this, success becomes increasingly doubtful. For success in a 
high school course in which the subjects were for the most part defi- 
nitely less difficult than algebra, these figures might be lowered by from 
10 to 15 points. Proctor mentions 95 as a minimum IQ (Alpha 
score 67). Probably in 90 cases out of 100, it is unwse to guide the 
average or less intelligent than average child into the present academic 
high school. Unless his IQ is over 100, or his mental age definitely 
over 14, he should be encouraged to try some other type of training. 

Additional evidence on the schodl progress of children who make 
low scores on Alpha may be gained from the comments made by 
Detroit high schools on their seniors who scored less than 85. We give 
these without omission though it should be noted that there were some 
cards which bore no comment. 

Score CoMMENT 


82 Industrious, but little ability. 

75 Fair record. Industrious. 

55 Very slow at studies, but capable in administrative work; fine character; 
studying nursing now. 

62 A colored girl, faithful, good typist, fair in Domestic Science. 

79 Lack of application. Too much interested in Girl Scout work; that her 
sole interest. Entered junior college. 

59 Peculiar case; she did not trust herself. Often depended on others. 

72 Very slow, very timid, very faithful and plodding. 

79 Not the brightest, but made of good stuff. Making good in bank. 

57 Sub-normal. 


78 Probably never developed her powers. Calm and easy-going disposition. 
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75 Low ideal. Under group influence. Did not study out of school till fourth 
year. Now in junior college. 

68 Lack of work. 

78 Was not interested in school. 

73 A fair student. 

81 Dull. 

61 Weak in English. 

84 Generally weak. A cripple. 


V. Geographical Differences in Intelligence——In any application of 
these findings concerning our intelligence, and the proportion of us to 
whom an academic high-school education or, for instance, the study of 
algebra is an advantage, it must be remembered that smaller parts of 
the country—sections, states, communities—depart widely from these 
general figures. How unbelievably large these geographical differences 
may be, between states and even whole sections of the United States, 
may be illustrated from the data concerning the draft which appears in 
Volume XV of the Memoirs of the National Academy of Sciences, and 
data about medical officers in Bulletin 8 of the National Research 
Council. This information is not highly accurate, especially for the 
draft. The scores are for those recruits only who were examined by 
means of the Alpha examination, that is, tuose who were considered to 
be adequately measured by it, or “‘literate.”’ Since the literacy stan- 
dard was not identical in different camps, and since in a few cases it 
had been impossible to reexamine recruits who should have been 
reexamined by a non-verbal or individual method, and they were 
improperly included in this ‘‘ Alpha only” group, there are known to be 
inaccuracies in the data. It is known, for instance, that the poor 
showing of the New Jersey recruits is at least partly due to this cause. 

These inaccuracies undoubtedly compensate one another to some 
extent when the states are grouped, and larger areas are considered. 
For this purpose the following grouping has been used: 


NORTHEAST ATLANTIC SOUTHERN 
Maine New Jersey Georgia 
New Hampshire Pennsylvania Florida 
Vermont Delaware Alabama 
Massachusetts Maryland Mississippi 
Rhode Island Virginia Louisiana 
Connecticut West Virginia Arkansas 
New York District of Columbia Oklahoma 

Texas 


New Mexico 
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Souta CENTRAL Norts CENTRAL CENTRAL 
North Carolina Ohio Indiana 
South Carolina Michigan Illinois 
Kentucky Minnesota Iowa 
Tennessee Wisconsin Missouri 
North Dakota Kansas 
South Dakota Nebraska 
WESTERN 
Oregon 
Washington 
Montana 
Idaho 
Wyoming 
California 
Nevada 
Utah 
Arizona 
Colorado 
TaBLE XIV 
| Draft Medical officers 
len oe Bee < , eee 
Section | | 
| _ | Median | Median | I | Lowest | Highest 
Median lowest | | highest | Median | 
| i] state! | state! 
state | state | | 
eee - sisnnend meeainnsiamt , oo -———______—- 
I Northeast........ 67 | 62 | 7 || 139 | 138 | 144 
II Atlantic......... 60 492 | 66 | 126 118 | 132 
| 
| } 
III Southern......... 47 | 41 | 60 115 108 | 125 
IV South Central....) 45 42 | 47 || 102 98 104 
V North Central....; 62 57 | 64 || 135 132 | 139 
VI Central.........., 62 | 56 | 66 {|| 126 | 123 | 188 
VII Western......... 75 64 | 80 | 140 137 144 














Table XIV gives the median score for each section, for the draft and 
In order to show how closely the median for the 
section is representative of the states included in the group, the median 


for medical officers. 





1 Medians derived from 10 cases or fewer are here disregarded. 


* This is New Jersey and is undoubtedly too low because of inclusion of 
illiterates, as previously explained. 
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11.— Distribution of Alpha Scores of recruits from the northeast section. 


12.— Distribution of Alpha Scores of recruits from the Atlantic section. 
13.—Distribution of Alpha Scores of recruits from the southern section. 
14.—Distribution of Alpha Scores of recruits from the south central section. 
15.—Distribution of Alpha Scores of recruits from the north central section. 
16.—Distribution of Alpha Scores of recruits from the central section. 
17.—Distribution of Alpha Scores of recruits from the western section. 





18.—Comparison of Alpha Scores of a southern and a western state. 
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for the lowest state and for the highest state in the group also are 
given. 

The distributions from which these medians are derived are pic- 
tured in Figs. 11 to 17. 

The evidence from the draft (40,530 cases) and from the group of 
medical officers (2507 cases), is in quite close agreement in indicating 
striking differences between different parts of the country; the rank 
order, and with one or two exceptions the relative size of the differ- 
ences, also correspond closely. 

It is obvious that these large differences in the intelligence of the 
population in different states has very important implications for 
education. Consider the comparison indicated in Fig. 18, which shows 
the distribution of scores for one southern (A) and one western (B) 
state. Suppose we make the very probable assumption that a neg- 
ligible number of persons who would as adults score less than 65 on 
Alpha will ever as children enter an academic high school. This 
means, in state A, that not over 25 per cent of the population needs to 
be provided for in freshman classes of such high schools; but in state B, 
64 per cent should be accommodated. It means that the distribution 
of school funds to schools of different types should be very different in 
these two states. Again, in state A it is unlikely that more than 12 
per cent, or about half the students who will attempt an academic 
course, will be able to finish the course and graduate. In state B, 
44 per cent, or about two-thirds of all who enter, should be capable 
of completing the course, and should be provided for to the end of the 
course. That is, the proportion of ninth-year to twelfth-year students 
is likely always to be different in the two states; and again, a different 
apportionment of funds, this time within the school, is indicated. 
Further, in state A, not more than about 4 per cent of all the school 
children—about 1 in 6 of those who enter the academic high school— 
are likely to profit by taking algebra, as now taught. In state B, 
about 24 per cent or more than 1 in 3 of those who enter high school, 
may profit from the present algebra course. Therefore the subjects 
offered, or at least the number of pupils provided for, in each will need 
to be quite different in two such states. | 

Moreover, since in some of the southern states probably as many as 
75 per cent of the children can not or will not enter academic high 
schools, the problem of providing other and perhaps new types of train- 
ing for children from 14 to 18 years of age is most acute in this part of 
the country. There is here a fertile field for pioneer work in origina- 











554 The Journal of Educational Psychology 


ting a curriculum which will fit their needs, for discovering what these 
children can and should be taught and what methods of presentation 
best reach them. What can best replace the academic curriculum for 
these children, to yield satisfaction in their own lives and enable them 
to become satisfactory citizens of a democracy? When educational 
authorities in the south see this as peculiarly their problem, and, with 
the increased federal aid which is coming, direct their efforts to solving 
it in their own way for their own region, rather than adopting the 
solutions of progressive western states where the proportions if not 
the conditions of the problem are quite different, we may expect new 
developments in secondary education which will command the atten- 
tion of all. 


SUMMARY 


1. Though Terman and others have indicated that mentality limits 
schoul achievement, few measurements are available to show just how 
fast or how far children of given mental equipment can progress in our 
schools. 

2. The intelligence of the high school population in this country is 
limited to approximately the upper half of the whole range of American 
intelligence. 

3 and 4. Intelligence is an important factor in determining the 
number of years a youth spends in school and college. The minimum 
intelligence usually necessary in order to enter high school is repre- 
sented at age 14 by an Alpha score of 65; the minimum usually 
necessary to achieve high school graduation is represented at age 14 by 
a score of 85 points; the minimum for profit from present high school 
algebra is about 105. 

5. Geographical differences in intelligence are enormous, the 
median for the lowest state being only half as great as that for the high- 
est. In certain states more than half the population is below the 
level apparently necessary for academic high school work; in others, 
three-fourths of the population may be expected to enter high school. 
This has an important bearing on the distribution of school funds. 
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ADDITIONAL DATA FROM CONSECUTIVE STANFORD- 
BINET TESTS | 


BIRD T. BALDWIN 
AND 
LORLE I. STECHER 


State University of Iowa 


This article presents supplementary data as a result of further 
tests by the Stanford Revision of the Binet Scale of 143 cases reported 
by the writers in January, 1922.!_ Just one year later 32 of the 36 cases 


TaBLeE I.—CoEFFICIENTS OF CORRELATION FOR IQ’s. Boys AND GIRLS 




















Examination number 1 2 | 3 | 4 | 5 
| | 
} | 
2 + .850 | | 
+ .031 | 
3 | +.738 + .846 
+.051 + .031 | 
4 + .779 + .802 + .910 
+.044 | +.040 + .019 
5 + .817 + .815 + .839 + .918 
+ .037 + .037 + .033 + .017 
6 + .812 + .751 + .796 + .866 + .944 
+ .038 + .049 + .041 | + .028 + .012 








who had received five previous examinations had a sixth; 40 of the 42 
cases who had had four previcus examinations had a fifth; 41 of the 51 
cases who had had three previous examinations had received a fourth; 
31 of the 56 cases who had had two examinations had received a third; 
64 additional cases with two examinations were included. 

These new data confirm the findings of the previous study that for 
practical purposes the IQ remains sufficiently constant for a group as a 
whole, but that the individual records show fluctuations which are 
smoothed out in obtaining general averages. The amount of these 
fluctuations is evident in the tables of original data in the previous 
study, pp. 24-29, which have been brought up to date in mimeo- 
graphed form and may be had on application to the writers. 


1 Baldwin, B. T. and Stecher, L. I.: The Mental Growth Curve of Normal and 
Superior Children Studied by Means of Consecutive Intelligence Examinations. 
Univ. of Iowa Studies in Child Welfare, 1922 (2), No. 1, pp. 61. 
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The inter-correlations with examinations, for those who have 
IQ’s, (given in Table 1), show the distribution of the individuals within 
this group on subsequent tests. The correlation between the fifth and 
sixth examination is the highest (+0.944), which probably means 
that the individuals have become thoroughly stabilized within the 
group. 

The writers have previously analyzed the sort of growth curve 
that results from the repeated application of the Stanford Revision. 
This curve represents one aspect of mental growth when measured by 
an existing tentative scale. Additional data permit the calculation 
(by the same method previously used) of the figures of Table II, the 
mean mental age in months for each sex at each age of children of 
superior and of average mental ability. 

Chart 1 shows these data in graphic form. The curves have in 


TaBLeE I].—Meran Mentau AGE In Montus or SUPERIOR AND AVERAGE Boys 
AND GIRLS FOR SUCCESSIVE CHRONOLOGICAL AGES (BASED ON CONSECUTIVE 











EXAMINATIONS) 
Boys | Girls 
Chronological age 
IQ 110+ IQ 90-110 IQ 110+ IQ 90-110 
(superior) (average) (superior) (average) 
5 72 61 73 62 
6 89 76 86 73 
7 103 87 101 88 
8 121 100 119 96 
9 134 112 131 114 
10 146 124 145 123 
11 160 132 160 134 
12 181 141 184 147 
13 191 156 200 159 
14 205 | 167 205 174 
15 213 180 216 180 
16 212 201 221 198 

















general the same appearance as those in the previous study with the 
exception of the curve for the average girls which lies much closer to the 
average boys curve than formerly, probably due to the addition of 
more average girls at this age. The average curves are approximately 
straight lines, which shows that these children are comparable to those 
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on whom the scale was standardized. In contrast with the straight- 
line average curves, the superior curves show fluctuations at the 
adolescent ages, indicative of the earlier mental development of 
superior children. Both the superior and the average girls of this 
group are in advance of the boys at the adolescent ages—12 to 14— 
when measured by this scale.! As previously pointed out, this 
adolescent spurt is analogous to the adolescent acceleration so fre- 
quently found in physical growth curves in height, weight, breathing 
capacity and other physical traits. 

Unfortunately we have not, in the present state of development of 
the science, any measuring instrument that at all approximates the 
apparatus for measuring physical growth. The cheapest measuring 
stick is superior, both in equality of units and in extent, to our mental 
measurement scales. These pcor mental tape lines wrinkle and 
stretch in places, and someone has cut off a little from both ends! 
The unit of measurement in mental growth scales is not an absolute 
unit such as the centimeter or the kilogram. The writers are in 
hearty agreement with the author? of a somewhat facetious review in 
regard to the desirability of discovering such an absolute unit of mental 
growth. An inch of growth in height is the same between 5 and 6 
years or between 12 and 13 years. There is good reason to believe, 
however, that 2 months mental growth may mean a very different 





1This conclusion has recently received some support from the evidence of 
Sullivan and Murdock (Journal of Educational Psychology, 1922, Vol. 13, 350-362). 

2 Sandiford, P.: Journal of Educational Psychology, 1922 (13) 378-379. The joint 
authors of this study, which the reviewer attributes mainly to one of them, take 
this opportunity to correct a few misapprehensions. (1) In view of the discussion 
above, there can be no objection to the plotting of mental age curves in regard to 
which the reviewer seems to have such a serious complex. (2) The reviewer 
comments on the fact that the authors believe the curves to be straight. That 
this is not the case is shown by the quotation (p. 12), ‘further analysis reveals, 
however, a very significant change in the trend with the approach of adolescence. 
This is especially marked in the curve for girls, etc.’”’ (3) The mental age curves 
and the IQ curves are, indeed, as the reviewer has aptly put it, “the same thing 
plotted in a different fashion.” Although both are approximately straight lines, 
“there are fluctuations associated with physical development”’ (in the IQ curve) 
and ‘‘there is a significant change in the trend with the approach of adolescence” 
(in the mental age curve)—surely not, as the reviewer states, ‘‘diametrically 
opposite conclusions.’’ (4) The authors presume that the reviewer failed to find 
one or two real errors which they now desire to point out. On page 12, beginning 
with line 25, one should read, ‘At 6 years —1 month, +11 months at the rate 


of 1.38 or 104+ (11 X 1.38) or 119.18.’ Other proof-reading errors will be found 
on pages 12 and 17. 
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thing at these two periods. The amount of mental growth for 2 
mental months at the earlier age may be only half that of 2 mental 
months at the later age. We do not know. We assume that the 
difficulty of the tests within the scale takes this into consideration and 
meets the differences fairly accurately. By the very fact of such con- 
struction, however, mental age scales tend to conceal any differences 
in the rate of mental growth that may exist. If any adolescent 
acceleration appears, it is all the more significant. Even the discovery 
of this hypothetical absolute unit of mental growth will not provide a 
scale for measuring mental growth, because mental growth like 
physical growth is a complex process involving development in a 
diversity of traits and functions. For example, physical growth is 
measured in inches, pounds, square inches, cubic inches, and a large 
number of other units for strength, temperature and metabolism 
measurements. It is possible to get some idea of the individual’s 
development from a measurement of the height or the weight alone, 
but a complete growth curve is the result of composite measurements. 
That the writers have already pointed out this fact in the earlier study 
is shown by the following quotation (page 58), ‘‘Theoretically it would 
seem to be a better measure of mental growth to use a combination of 
point scales for specific mental traits, each scale to be sufficiently 
extended to measure whatever ability exists and the whole system to 
include a sufficient variety of traits to afford a general measure of the 
development of the individual.” 

















A METHOD OF MEASURING FATIGUE OF THE EYES! 
RALPH E. WAGER 
Emory University, Ga. 


Several factors are to be taken into account in connection with any 
method designed to measure fatigue of the eyes. There are involved: 
(1) The retina, (2) the refracting mechanism, and (3) the internal and 
external musculature. 

Variations in retinal sensitivity may be followed in most cases by 
laboratory tests which need not be here described. It seems to be well 
established, however, that the purely nervous elements of the bodily 
mechanism are the last to suffer losses under adverse physical condi- 
tions. Not only so, but such tests do not lend themselves well to the 
measurement of fatigue because (1) wide variations occur within 
narrow time-limits, (2) central factors seem to be largely involved, 
and (3) certain physiological activities, such as the vaso-motor waves, 
or even the act of breathing, seem to influence the results. 

The refracting mechanism may, in ordinary laboratory procedure 
at least, be regarded as a constant since changes take place in it but 
slowly and usually as a result of advancing years. Such changes are 
not within the range of the field covered by fatigue. 

There remains, then, the musculature as the most significant 
factor. It is commonplace that muscular capacities are subject to 
marked changes due to physiological causes. This fact is obvious 
both through experience and experiment. The amount of change is in 
many cases measurable. One thinks, in this connection, of the work of 
Mosso and his epochalergograph. Such recoverable losses in muscular 
capacity as are commonly experienced by all active muscles may be 
regarded as due to fatigue, and are accounted for as the result of the 
accumulation within the substance of the tissues of certain katabolic 
products. Recovery of power is due to either the transformation or 
elimination of these substances. Their presence, however, in suffi- 
ciently large amounts, brings about a lessened capacity to do work, 
largely, it appears, because of certain positively or negatively charged 
ions whose effect is to prevent the passage of an adequate stimulation 
into the muscle substance. In certain pathological cases it is likely 





1 A summary of a dissertation submitted to the faculty of the Graduate School 
of Arts, Literature and Science in the University of Chicago. The author 
acknowledges indebtedness to Dr. F. N. Freeman for suggestions and criticisms, 
and to many friends who assisted in the investigation. 
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that exhaustion of the energy-furnishing substances may also take 
place, but normally the inhibiting ions are developed in sufficient 
amount to manifest their effect before this condition is reached. 

It is probable, therefore, that changes in ocular powers are due 
largely to changes in muscular capacities. The muscles involved are 
(1) the ciliaris, which controls the accommodative reactions, and 
(2) the external muscles which function in the acts of convergence and 
divergence. Both sets of muscles are brought into play when one 
shifts his field of regard from a point in a near plane to one more dis- 
tant, or vice versa. , 

The method of measuring fatigue herein to be described is based 
upon the last made assumption. Its claim for merit lies chiefly in the 
fact that it is largely objective, and requires but little training or pre- 
vious experience with the apparatus in order to be effectively used. 
It can be used with children who have learned to read. 

The discussions to follow will be presented under the following 
topics: 

1. Description of the apparatus. 

2. Description of the tests and the manner of application. 
3. Typical cases and discussion of results. 

4. General conclusions. 


DESCRIPTION OF THE APPARATUS 


The method, as has been suggested, is based upon the assumption 
that the most likely measure of ocular fatigue may be had by attempt- 
ing to determine losses or gains in ocular muscular capacities resulting 
from rapid shifts of the field of regard so that both accommodation and 
convergence and divergence are necessitated. Such are involved 
when a change or shift in the field of regard is made from a near to a 
more distant plane, or the reverse, as has been suggested above. 

In order to compel such shifts alternately from one plane to another, 
two hard-rubber discs, 6 and 8 inches in diameter respectively, were 
mounted on a steel rod 48 inches long. In front of each disc was a 
shield sufficiently large completely to cover it, and bearing in its 
upper margin an opening 1 inch square. Both shield and disc were 
movable along the rod whereon they might be secured at any point 
by means of a set-screw. The rod with its discs was supported upon 
suitable tripods and other accessory parts. 

By means of parts which need not here be described in detail, the 
rod, with its attached discs, could be caused to rotate through a fraction 
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of a turn with each impulse applied to a foot-pedal. Each fractional 
rotation was through }4,4th of a turn, or 15 degrees. These move- 
ments were controlled in extent, and made rapid and vibrationless, by 
means of suitable devices secured to, and operating against, a third 
disc attached to the distal end of the rod above mentioned. This 
latter was outside the point of attachment to the supporting tripod, 
and, by means of a lever and appropriate ratchets playing into two 
sets of toothed discs, controlled the partial rotations and at the same 
time, by means of pins set in its margin, made an electric circuit which 
served to register the instant of the movement of the discs. 

Attached to the two discs first mentioned were paper forms of the 
same size, bearing on their margins printed words, either singly or in 
groups, so disposed as to bring each behind the opening on the shields 
at each partial rotation. These words were of different sized type for 
the near and distant plane so that the image formed upon the retina 
would be approximately the same size in each case. 

The apparatus was placed upon tables so that, in the line of the two 
discs, a subject might be seated with his eyes about on a level with the 
openings in the shields. A suitable head-rest was provided. The 
near disc was placed at a distance of 8 to 10 inches from the eyes, and 
the more distant from 36 to 52 inches, depending in each case upon the 
accommodative capacities of the subject as determined by experiment. 
Shifts were then made in the planes as will be described below. 

It is advisable at this point to make clear the manner in which the 
sequential fixations and accommodations were utilized as a basis of 
measuring changes in capacities. 

It is evident that two essential factors are involved: (1) The 
rapidity with which the necessary muscular reactions are brought about, 
and (2) the accuracy with which they are accomplished. This latter 
involves the element of acuity, or correctness with which perception 
takes place. Each of these factors is taken into account in a manner 
next to be described. 

The speed, or rapidity, with which the muscular adjustments were 
accomplished, was measured by (1) a voice-key, interposed between 
the subject and the near disc, and insulated against vibrations by being 
placed upon a sand-filled pedestal, and (2) the pegs on the rear dise 
(control) which served to make an electric circuit with each partial 
rotation. The voice-key was of the Roemer type, modified in a 
manner to make it automatic in its resetting, and exceedingly sensitive. 
The appearance of a word behind the opening of the shield was the 
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stimulus for its being read aloud as quickly as perception made it 
possible. The response of the voice-key to the sound waves made an 
electric circuit which recorded the instant of the subject’s perception 
as evidenced by the spoken word, and the control disc made the con- 
tact which caused the registration of the instant of the appearance 
of the word, as above noted. Records were made on smoked paper 
in the form of a long belt stretched over two drums which were driven 
by a 149 h.p. motor with suitable reducing and controlling elements to 
give a speed adapted to the purpose. A triple time marker was used. 
One element was operated by the make-circuit of the voice-key, 
another by the make-circuit on the control-disc, and the third by a 
Jacquet chronometer, placed in a circuit, for the purpose of giving a 
suitable time-record against which the other two might be projected 
and intervals determined. The time-record was made in fifths of 
seconds. The instant of appearance of the stimulus word being 
recorded, together with the subject’s response thereto, it became possi- 
ble to measure the period required for the fixation-accommodation act 
incident to it. This will be made clearer in connection with the 
explanation given below. | 

Acuity, was taken into account by noting the number of correct and 
incorrect responses. This will be referred to below. 


DESCRIPTION OF THE TESTS AND THEIR APPLICATION 


Two forms of tests were finally adopted as best suited to the 
purpose of the undertaking. These need to be described. 

The first was called the 2-1 type. In it 24 words were used on the 
distant paper disc, and 12 on the near, each alternate space remaining 
blank. In applying the test, a word appeared in the opening of the 
rear shield, the near one being blank. Perception of the distant stimu- 
lus was followed by a partial rotation which brought a word behind 
each opening. ‘The distant was then perceived and an immediate 
shift in the field of regard took place for the purpose of reacting to the 
near one. Another fractional rotation restored the original condition, 
following which the process was repeated until the complete series was 
run off. A portion of a record made in a test of this sort is shown in 
Fig. 1. The letter A indicates the registration of the instant of per- 
ception of the distant word; the partial rotation following is indicated 
at 1; the recognition of the new word in the distant plane is recorded at 
B, and the following in the near plane at C. The succeeding change in 
stimuli is marked by 2, which ehange restores the original condition. 
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Tabulations from a reeord of the above type are made in Table I 
in those parts indicated as I, Ia, III and IIIa. Referring to I, the 
numbers in the column marked a is the time elapsing (in fifths of 
seconds) between the appearance of the stimulus word in the distant 
plane and its perception (1 to B); those in the column marked b, the 


UUPUUUNUUUUUUUD CO UCURUNEUUDUUGCUUUUD CUP OUD SUD UsEUuCUUSEUSUSUlUSSEssSsopesecey 





Fig. 1. 





Fic. 2. 


times for the fixations and accommodations necessary for the recogni- 
tion of the word in the near plane (B to C); those in the column 
indicated by c, the times for the adjustments for the new word in the 
distant plane (C to A) the latter including also the reaction time of the 
operator (C to 2) which is given separately in column d. As will be 
described below, averages of each of these series of measures are 
required. In deriving that for column c, either the total of d is sub- 
tracted from c, or the average of d taken from that of c so that the 
effect of the operator’s reaction time is eliminated. The averages 
of the columns (the function of which will be discussed below) are 
indicated as well as the average of the averages. 

The second form of test used was called the 1-5-5 type. Its 
advantage (and difference) over the other lay chiefly in the fact that 
the adjustments in each plane had to be maintained for a longer time. 
On the rear disc a single word alternated with a series of five words. 
On the near disc, series of five words alternated with blanks. The test 
began with the recognition of a single word in the distant plane, the 
near opening being blank. A partial rotation followed, bringing behind 
each opening a group of five words. The more distant ones were read 
in as rapid succession as possible after which a shift in the field of 
regard took place to the near plane for the purpose of perceiving 
the words therein appearing, and which were likewise read as rapidly 
as possible. Another partial rotation brought a single word in the 
distant plane, thus restoring the original condition. Figure 2-is a 
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reproduction of a portion of a record made in connection with a test of 
this type. It is interpreted as follows: C marks the time of recogni- 
tion of the single word in the distant plane. The succeeding fractional 
rotation is timed at I; the response to the five words in the distant 
plane are indicated at B-B-B-B-B, and to those in the near plane at 
A-A-A-A-A. The following partial rotation is timed at 2, and 
restores the original condition. 


TABLE I 

Subject Ph. 
—2 lens 
Distances: $ and 36 inches 
Stimulus lists: 

I. DandP II. 5yand5c Ill. EandL 

Ia. Ia and Q a nego 5a Illa. AandR 

c 









































(a) (d) (a) (b) (c) (d) (e) 
4.5 9 10.5 3 II. s 25 12 23 s 
5 7 24 16 6.5 20 10.5 15 6 
6 10 9 3 8.5 34 12 14 11 
7 16 10 5 10 14 13 13 8.5 
8.5 7 11 2 6.5 19 11.5 13 6.5 
5 7.5 16 2.5 Ss 24 13.5 14.5 16.5 
5.5 8 18 2 7.5 10 13.5 15 21 
5.5 11.5 11 4 15 35 14.5 15 18 
7 9 9 3 9.5 43 38 71 14 
14 10 9 2.5 21.5 31.5 18 15 23 
5.5 9 11 2.5 15 10 14 i) 17 
6 12 9.5 3 19 37 31 47 10 
79.5 116 148 48.5 135 302.5 201.5 264.5 149.5 
6.6 9.7 12.3 4 11.2 25.2 135.8 22 12.5 
Average, 8.2 Errors, 3 Average, 17.5 Errors, 21 
Coefficient, .65 Coefficient, . 24 
Ia, 26 24 23.5 & Ila. 9 19 8 26 5.5 
7 9 18 6.5 9 44 13 22 11 
7 6.5 17 9 12 72 23 38 il 
Ss 8 16.5 2 s 48 12 28 7 
9 9 11 3 17 25 12.5 14 11 
12 9 32.5 20 18 40 12 17.5 14 
13 9 14 5 12 40 1l 16 23 
6 7 18. 9 17.5 46 13 14 12 
7 6.5 27 7 14 30 1l 12 14 
21 12 1l 3 9 23 8.5 8 10 
19 14.5 44 13.5 14 33 8 13 8 
8.5 10.5 8.5 2 
143.5 125 251.5 88 148.5 436 143.5 227.5 133.5 
11.9 10.4 20.9" 7.3 12.4 36.3 11.9 18.9 11.1 
Average, 11.9 Errors, 5 Average, 18.1 Errors, 28 
Coefficient, .36 Coefficient, .21 
Zs &. 9 13.5 2 IIIa. 8 8.5 10 3 
6.5 10.5 26 15 7.5 11 20 3 
6 20 19.5 7.5 10 15 27 13 
15 12.5 27 19.5 7.5 21 12 3.5 
7 S 11 3 7 11 11.5 4.5 
6 12 14.5 6 8 9 31 4 
6.5 9 11 2.5 11 8 13 4 
6 10.5 13.5 2.5 6 & 16 2.5 
10 11 11 2 16 10 11 3 
8 10 20 6. 13.5 12 22 14 
7 14 23 7 7 7 9.5 3 
10 s 12 2.5 
83.5 126.5 190 73.5 121.5 128.5 195 60 
7.6 11.5 17.3 6.7 10.1 10.7 16.2 5 
Average, 9.9 Errors, 8 Average, 10.6 Errors, 3 
Coefficient, .38 Coefficient, .43 


Average loss, 18.5 per cent. 


Tabulations made from the type of test above described are shown 
in Table I, Parts II and IIa. Referring to the column marked (a), 
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it is to be understood as made up of the times for the perceptions of the 
first of the five words in the distant plane (1 to B); column bp, the times 
for the similar reactions to the remaining four words of the series 
(B-B-B-B); c, the times for the adjustments involved in shifting from 
the distant to the near plane (B to A); d, the times for the reading of 
the remaining four words of the near series (A-A-A-A); e, the times for 
the perception of the single word in the distant plane following a change 
in stimuli (2 to C). Averages of the columns are also given as is also 
an average of the averages. It is to be noted that columns b and d are 
for the intervals involved in reacting to four successive stimuli so that 
as a result the final average of this type of test is not comparable 
with that of the 2-1. It would be possible to measure each reaction 
separately, but the manner in which the measures were utilized made 
this unnecessary. 

Another important factor in the test remains to be noted. In 
order to place the muscles of accommodation upon their near-limit, 
and thus make fatigue effects the more quickly and certainly manifest, 
the subject wore, during the test, minus lenses of a refracting power 
sufficiently great just to enable adjustments in the two planes. A feel- 
ing of effort accompanies such a condition. The lenses used were 
determined by careful experimentation with each subject. This was 
usually done in connection with the preliminary trials with the 
apparatus. 

It was essential that the outcomes of the tests be reducible to single 
number which might serve as a measure of the ocular powers. Thus 
only might valid comparisons be made. Note has been made of the 
fact that the tests involve both speed (rapidity in fixations and accom- 
modations) and acuity (correctness of perceptions). Measuresofeach 
were combined into a single number in a manner next to be explained. 

Since both factors are clearly involved in effective ocular activities, 
the final measure must reflect variations in either one. This necessity 
was met by making the percentage of correct perceptions the numer- 
ator of a fraction whose denominator was the average fixation-accom- 
modation time (the average of the averages) expressed in hundredths 
of seconds. This ratio is obviously influenced by changes in either 
factor. It may be that one is more significant than the other, but as 
yet only speculation is possible on the point. The fraction, reduced to 
the form of a decimal, was called the fixation-accommodation coeffi- 


‘cient. It seems, from the evidence collected, to be a fair means of 


representing numerically the ocular capacities existing at the time of 
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the test. In Table I such coefficients are indicated for each type 
of test used in an actual series of such measures. 

It became apparent as the result of repeated applications of the 
tests to individual subjects, that each tends to exhibit a characteristic 
fixation-accommodation time (within limits). Since this is a factor 
in determining the value of the coefficient, it is clear that comparisons 
for the purpose of determining relative gains or losses are valid only 
when made between coefficients belonging to the same individual. 
Initial, or native, capacities may be represented with at least a certain 
degree of validity by means of the coefficient. Because of such 
differences valid comparisons for the purpose of noting changes can be 
made only between individual scores. Furthermore, the fact that the 
average of the 1-5-5 type was based upon the use of a single measure 
for the reading of the four words in both near and distant planes gave 
to the coefficient derived therefrom a value not at all directly com- 
parable with that from the 2-1 type. Comparisons can be made only 
between tests of the same type. The manner in which this was done is 
explained below. 


APPLICATION OF THE TESTS 


After the tests were developed it became advisable to accumulate 
evidence as to their reliability. This was done by applying them te 
subjects before and after ocular fatiguing experiences and noting the 
manner in which the results compared with the expectations. Owing 
to limited space, only a brief account can here be given concerning 
either the detailed tests or the applications to individual subjects. 
The following account, however, may serve to indicate the nature of the 
work done, and the general conclusions may be accepted as based upon 
the evidence. The number of cases as yet investigated is too limited 
to yield highly significant conclusions. The investigation was con- 
cerned primarily with the evolution of a method, and not so directly 
with the development of a body of facts based upon its application. 
However, certain facts seem obvious from the cases studied, and they 
are stated in the concluding paragraphs. We shall next describe the 
manner in which the tests were applied. 

At some time following the preliminary experience with the 
apparatus, and the determination of the lenses to be worn, a battery 
of tests was given to the subject as follows: A 2-1, a 1-5-5, and a 2-1 


type test in the order named. If it were desired that ocular fatigue be > 


induced, for the purpose of attempting to measure it, the subject was 
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immediately placed in an illumination so low as to place a severe 
burden upon the eyes, and in it he read for a stated period. The degree 
of illumination was measured by an illuminometer. In other cases, 
normal conditions of illumination prevailed, according to the purpose 
of the investigation. In either event, the reading was followed by the 
administration of another battery similar to the first and given under 
identical conditions of illumination, etc. For each test the coefficient 
was then determined. Those preceding the reading were called ante- 
cedent tests, and those following, subsequent. The coefficients of the 
antecedent series were then compared with the corresponding mem- 
ber of the subsequent and the percentage of gain or loss, as evidenced 
by the coefficients, was determined for each pair. The average of the 
three results (summed algebraically) was used as the most likely 
measure of the losses or gains. Reference to Table I will clarify the 
procedure. 

As an illustration of the manner in which the tests were used, and 
also for the purpose of presenting some of the evidence on which the 
conclusions are based, the following typical cases will be cited. 


Sujbect Ob. Positive Accommodation, 3.5 diopters 


Test 1. Read 1 hour 15 minutes in .8 f.c. illumination. Loss 10.8 per cent 
Test 2. Read 1 hour 55 minutes in .5 f.c. illumination. Loss 11.8 per cent 
Test 3. Read 1 hour 10 minutes in 28 f.c. illumination. Loss 19.9 per cent 


Test 3 was made at 10:00 a. m. The subject reported that his 
eyes ‘‘felt tired’? due to late reading the night preceding by way of 
preparation for an examination. This may explain the large loss fol- 
lowing reading in a favorable illumination. Evidently the fatigue 
of the excessive use of the eyes carried over until the following day. 
Another case, almost identical, but with even greater losses, adds to the 
validity of the explanation used. 


Subject Ph. Positive Accommodation, 2 diopters 


Test 1. Read 1 hour 10 minutes 1f.c. illumination. Loss 18.5 per cent 
Test 2. Read 1 hour 40 minutes .1f.c. illumination. Loss 18.6 per cent 
Test 3. Read 1 hour 33 to 76 f.c. illumination. Gain 13.7 per cent 


The gain in Test 3 is to be noted. It was made early in the morn- 
ing before any study had been done. It is possible that a sufficiently 
pronounced ‘‘warming up” was not given before the antecedent 
series. This subject in these and other tests exhibited a marked sus- 
ceptibility to fatiguing conditions. Note should be made also that 
both he, and several others, reported a peculiar rhythmic appearance 
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and disappearance of the word which served as stimulus following 


fatiguing experiences. The effort to see clearly was frequently of 


little avail. The perception came ‘‘on the fly.”’ Others described 
the sensation as like that which would result if the words were on a 
spring bobbing slowly forward and backward. This was due, un- 
doubtedly, to the rhythmic contraction and relaxation of the ciliari. 


Subject Br. Positive Accommodation, 2.25 diopters 


Test 1. Read 2 hours 15 minutes 1 f.c. illumination. Loss 18 per cent 
Test 2. Read 1 hour 40 minutes .5 f.c. illumination. Loss 12 per cent 


This subject had a very high-pitched and feeble voice. It was 
possible for her, however, after a little training, to speak against the 
‘voice-key in such a manner as to make good records. 


Subject O. Positive Accommodation, 3 diopters 


Test 1. Read 1 hour 10 minutes 1 f.c. illumination. Loss 23 per cent 
Test 2. Read 1 hour 10 minutes 68 f.c. illumination. Loss 7.5 per cent 


This subject lost even in good illumination. He has excellent 
vision, but is susceptible to fatigue. He reported trouble with his 
eyes whenever he has had to read under poor illumination. 


Subject Her. Positive Accommodation, 3 diopters 


Test 1. Read 55 minutes .5 f.c. illumination. Gain 61 per cent 
Test 2. Read 1 hour .3 f.c. illumination. Gain 14.6 per cent 
Test 3. Read 2 hours 40 minutes .2 f.c. illumination. Gain 10.8 per cent 


The fact that a gain occurred in each test is a noteworthy fact. 
It is to be observed, however, that with an increase in the reading 
period and a decrease in the illumination, the amount of gain is cor- 
respondingly lessened. There is evidence that the gain in power is 
genuine. The case next to be presented is similar. 


Subject Kl. Positive Accommodation, 4.5 diopters 


Test 1. Read 1 hour 15 minutes .2 f.c. illumination. No change 
Test 2. Read 2 hours 1 to 1.5 f.c. illumination. Gain, 5 per cent 


This subject, like the preceding, manifested a marked resistance to 
fatiguing conditions. A test other than the two here reported was 
made by him, and yielded similar results. 

It is appropriate to add at this point certain further comments 
concerning the general nature of the results. 

It is apparent that there is a wide variation in the susceptibility 
of subjects to fatiguing conditions. Some, of which the first four: 
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cases presented may be chosen as typical, tend to show a marked 
falling off after relatively short reading under low illuminations. 
Others, such as the last two cases, are highly resistant. In three of 
our cases well marked gains occurred after periods of reading under 
conditions which one would expect to reduce greatly the ocular powers. 
To the former type we have given the term “‘minus”’ since the losses 
are immediate and marked; to the latter, the term ‘“‘plus” since there 
is frequently a gain following the fatiguing conditions. These two 
represent the extremes of a varying series. That there is an actual 
increase in ocular capacities is evidenced not only by the tests, but 
also from subjective evidence. One subject in particular reported a 
feeling of exhilaration and ability to distinguish words which before 
the reading could not be seen clearly. It'may rightly be argued that 
this sensation is itself good evidence for fatigue. That the eyes were 
not impaired in their functions is, however, the important fact. 

That these differences are not ascribable to initially higher powers 
on the part of the one type is evidenced by a consideration of the 
coefficients of the antecedent series of the two types. Such a com- 
parison is shown in the following figures: 


Average Coefficients Average Coefficients 
Antecedent 2-1 tests Antecedent 1-5-5 tests 
RE ee 44 .62 .46 
Dates types... .. 0c. .57 .60 54 


From this it appears that there is no striking initially superior 
capacities belonging to the more resistant individual. 

These extremes differ with respect to one characteristic which may 
in part account for the results they yield. If the average deviation of 
each of the series of measures (7.e., the columns) be determined, and 
that of the antecedent then compared with the corresponding subse- 
quent element, it is found that the minus type tends to exhibit a more 
pronounced increase in variability. The coordinations are obviously 
less perfectly controlled. Such a comparison is presented below. 

Plus types 


22 paired measures, of which 13 show decreases, 9 increases 
Minus types 


33 paired measures, of which 14 show decreases, 19 increases in the average 
deviations. 
Comparisons were also made as to the effect of fatigue upon the 
coordinations involved in the adjustments for the shifts in the two 
directions—toward and away from the subject. Here again it appears 
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that the minus type exhibits the more marked increases in average 
deviations, especially in the shift from the near to the distant plane. 
This difference may indicate nervous instability, or actual muscular 
incapacity. It is conceivable too, that central factors are concerned. 
Possibly all are. 

Without attempting to present the detailed evidence upon which 
they are founded, a summary of the results of the application of the 
tests, as well as the implications from the conditions under which they 
were given, may be brought together in the following 


GENERAL CONCLUSIONS 


1. Ocular fatigue may be measured by the speed of shifts in fixation 
from a near to a more distant plane, and by the accuracy of the per- 
ceptions, the process being accompanied by the wearing of minus 
lenses fittingly chosen. The fixation-accommodation coefficient 
derived from the records thus made is evidently a fair measure of 
ocular capacities. 

2. The validity of the method is evidenced in that the results 
obtained by its use correspond in the main with the expectations. 

3. From the cases investigated it appears that ocular powers range 
from those (1) in whom a high resistance to fatiguing conditions 
prevails to those (2) in whom a ready susceptibility exists. These 
extreme types we have called the ‘‘plus”’ and ‘‘minus.”’ 

4. Individuals differ greatly with respect to their average fixation- 
accommodation times. A characteristic average time (within limits) 
appears to exist. 

5. Fatigue tends to increase the variableness of the muscular 
adjustments in the less resistant types. 
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